How to Retrain Your Text Elements with AskUI OCR Teaching App

October 23, 2024
Tutorials
The image shows a man with light skin, short gray hair, and a contemplative expression, resting his chin on his hand while looking upward. The background is a dark purple, and large neon-yellow text with a pink outline reads, "Retraining OCR Vision Model." On the top left is a green arrow pointing to the text. At the bottom right corner is an icon resembling a camera and a video bookmark symbol in light gray. The image gives the impression of a tutorial or educational content about retraining an OCR (Optical Character Recognition) vision model.
linkedin icontwitter icon

User interface automation is hard. Especially when you rely on Optical Character Recognition (OCR) to target text.

AskUI is no stranger to this challenge. While our base-vision-model works great for most of the texts you want to target. It suffers from flakiness with specific fonts sometimes, too.

But the cool thing is, that OCR's recognition rate can be greatly increased by retraining our base model with your specific text elements. For this we created the AskUI Teaching App where you upload a screenshot that gets analyzed. Then you can check each text element that has been detected and correct falsy recognition.

This tutorial will walk you through the process step-by-step.

Prerequisites

Create a Screenshot with the Flaky Text Elements

Flakiness often happens in specific situations and you probably know exactly in which. Take a screenshot of the whole screen for each of those situations and save them in a folder. Here is the one I use for this tutorial:

The image shows a webpage titled "AskUI Practice Page" featuring a calculator interface. At the top, there are several navigation tabs labeled "Calculator," "Register," "Images," "Team," and "Android," with "Calculator" highlighted in purple. Below, a basic calculator is displayed with buttons for numbers 0–9, arithmetic operators (+, -, *, /), parentheses, percentage, decimal point, and an equals sign. The display screen currently shows "0." A "Switch to Dark" button is in the top-right corner of the page, and at the bottom is a large rectangular "Drop here" box next to the AskUI logo. The URL at the top is "askui.github.io/askui-practice-page/".

Start the AskUI OCR Teaching Application

The AskUI OCR Teaching Application has to be started from the AskUI Development Environment (ADE). First activate the ADE in a terminal:

askui-shell

Then import the experimental commands:

AskUI-ImportExperimentalCommands

Now start the OCR Teaching Application:

Start-AskUIOCRTeaching

This should bring up the following application window:

The image shows the interface of the "AskUI OCR Teaching (Experimental)" application. At the top, there are three input fields labeled "BASE URL," "WORKSPACE ID," and "TOKEN." The BASE URL is pre-filled with "https://inference.askui.com." To the right, there are two toggle switches labeled "Trained Model" and "Word-Level Model," both in the off position (gray). A "Copy Model" button is positioned on the far right. In the center of the interface, there is a message that reads "No image selected," with a purple camera icon below it, indicating an option to upload or select an image. The background is light gray, giving a clean, minimalistic appearance.

Create Credentials in AskUI Studio

The model you finetune is our base model. The retrained model gets saved into your workspace and can be retrained again if you encounter flakiness again.

We recommend to create a new access token in AskUI Studio under Access token for preventing a leak of it.

Enter the workspace id and the access token into the fields in the AskUI OCR Teaching Application.

The image displays a digital interface indicating the successful creation of an Access Token for AskUI configuration. At the top, a bold heading reads "Access Token Created," followed by instructions to add it to the AskUI environment. There are two sections highlighting the Access Token and Workspace Id with orange-red filler text, each with a note to save the information before proceeding. Below, a suggested command for AskUI-Shell usage is shown, along with a similar orange-red highlighted placeholders for WorkspaceId and Token. A small “OKAY” button is visible at the bottom right.

Upload the Screenshot

Finally it is time to import the screenshot and get a list of all the detected text elements. Click on the camera icon and import your screenshot. It may take a little while before the text elements appear on the right side.

The image depicts an interface of the AskUI OCR Teaching tool. On the left, there's a screenshot of a webpage displaying a basic calculator with buttons and tabs labeled "Calculator," "Register," "Images," "Team," and "Android." On the right, there is a list with text options like "Chrome," "File," "Edit," "View," "History," "Bookmarks," and "Profiles." Above, there are toggle switches labeled "Trained Model" and "Word-Level Model" which are turned off, and a button labeled "Copy Model." The workspace ID and base URL are visible at the top left corner of the interface.

Retrain the Model

Please also switch on the Trained Model switch, so you can start training for correction.

Scroll through the text elements and check if the they were recognized correctly. If you find one that is wrong as in the following screenshot, you correct it and click on Train Correction. It may take a few seconds to complete. Once it completes it will redo the detection and you can check again if the recognition is correct.

Do not hesitate to retrain again if the recognition is still wrong!

The image shows a user interface of the AskUI OCR Teaching (Experimental) software on a light purple background. At the top, there are sections labeled "BASE URL," "WORKSPACE ID," and "TOKEN," followed by a toggle switch labeled "Trained Model," which is turned on, and another switch labeled "Word-Level Model" that is off. The main view contains a screenshot of a web page titled "AskUI Practice Page," featuring a digital calculator with a grid of numbers and function buttons. On the right side, there are several blurred images with corresponding time stamps and text snippets, each with a "Train Correction" button next to them. One of the images is highlighted with a red border, emphasizing it for further focus or correction.

Copy the New Model Configuration to Your AskUI Project

Once you are happy with all the recognized text elements, you can add the model configuration to your AskUI Project. Click on Copy Model to copy the configuration to your clipboard.

Head over to your helpers/askui-helper.ts and find the the line with aui = await UiControlClient.build({ in it. Insert the model composition like in the following code snippet:

...
  aui = await UiControlClient.build({
    ...
    modelComposition: [
      <Here goes your model composition>,
      // This is important!
      // Otherwise only text will be detected.
      {"task":"od","architecture":"yolo","version":"6","interface":"c9","useCase":"default","tags":[]}
    ]
  });
...

Conclusion

With OCR Retraining our base model can be adapted to your specific use-case. Giving you the power to remove flakiness from your AskUI Automations without much time investment.

Also check the docs section for OCR teaching.

·
October 23, 2024
On this page