Paid Project , Project ,

Improving Text Recognition capabilities using Transformer

In this project, we’ll improve CRNN model using the Beam Search technique for text generation. Then we’ll improve the text recognition capabilities using a State-of-the-art transformer model.  We’ll use TrOCR model from Microsft for building the OCR model. For that we’ll use the base model and make inferences on some images using that, and then we’ll fine-tune that TrOCR base model on the IAM dataset using HuggingFace tool. On completion of the training of the model, we’ll use TrOCR_hadwritting model (which is fine-tuned on the IAM lines dataset).

What will you Learn in the Project?

  1. Making inference using CRNN model with beam search decoding
  2. Understand HuggingFace model and processor module 
  3. Inferencing on TrOCR model using  HuggingFace libraries 
  4. Fine-tuning the OCR model on the IAM dataset

Tools & Technologies Used

  1. Google Colab 
  2. HuggingFace (transformers, datasets) 
  3. PyTorch (Dataset) 
  4. sklearn 
  5. Pandas


  1. Working knowledge of tools such as Tensorflow, Huggingface (transformer, datasets), library 
  2. Understanding of Dataset module of Pytorch library 
  3. Good theoretical understanding of concepts related to Transformer [Encoder-Decoder] architecture and text generation concepts such as Beam Search. 
  4. Understanding of TrOCR model architecture (which is the Transformer based OCR model).

Tasks Performed

Task-1: Create an HTR (Handwriting Text Recognition) model using Beam Search (SimpleHTR)

Task-2: Make inference on TrOCR (transformer) model on the test images using the HuggingFace transformer library. 

Task-3: FineTune TrOCR model on the IAM dataset for lines set

Task-4: Make inference on TrOCR IAM (fine-tuned) model on the test images

Not Enrolled
or 999₹ 9999
91% off

Skills you will develop

Clone pre-trained model from GitHub and make inference using CLI

Load and make inferences on the images using the HuggingFace transformer library

Train encoder-decoder transformer-based model for the OCR purpose

Share with Friends and earn points!!