User interface automation is hard. Especially when you rely on Optical Character Recognition (OCR) to target text.
AskUI is no stranger to this challenge. While our base-vision-model works great for most of the texts you want to target. It suffers from flakiness with specific fonts sometimes, too.
But the cool thing is, that OCR's recognition rate can be greatly increased by retraining our base model with your specific text elements. For this we created the AskUI Teaching App where you upload a screenshot that gets analyzed. Then you can check each text element that has been detected and correct falsy recognition.
This tutorial will walk you through the process step-by-step.
Prerequisites
Create a Screenshot with the Flaky Text Elements
Flakiness often happens in specific situations and you probably know exactly in which. Take a screenshot of the whole screen for each of those situations and save them in a folder. Here is the one I use for this tutorial:
Start the AskUI OCR Teaching Application
The AskUI OCR Teaching Application has to be started from the AskUI Development Environment (ADE). First activate the ADE in a terminal:
askui-shell
Then import the experimental commands:
AskUI-ImportExperimentalCommands
Now start the OCR Teaching Application:
Start-AskUIOCRTeaching
This should bring up the following application window:
Create Credentials in AskUI Studio
The model you finetune is our base model. The retrained model gets saved into your workspace and can be retrained again if you encounter flakiness again.
We recommend to create a new access token in AskUI Studio under Access token for preventing a leak of it.
Enter the workspace id and the access token into the fields in the AskUI OCR Teaching Application.
Upload the Screenshot
Finally it is time to import the screenshot and get a list of all the detected text elements. Click on the camera icon and import your screenshot. It may take a little while before the text elements appear on the right side.
Retrain the Model
Please also switch on the Trained Model switch, so you can start training for correction.
Scroll through the text elements and check if the they were recognized correctly. If you find one that is wrong as in the following screenshot, you correct it and click on Train Correction. It may take a few seconds to complete. Once it completes it will redo the detection and you can check again if the recognition is correct.
Do not hesitate to retrain again if the recognition is still wrong!
Copy the New Model Configuration to Your AskUI Project
Once you are happy with all the recognized text elements, you can add the model configuration to your AskUI Project. Click on Copy Model to copy the configuration to your clipboard.
Head over to your helpers/askui-helper.ts
and find the the line with aui = await UiControlClient.build({
in it. Insert the model composition like in the following code snippet:
...
aui = await UiControlClient.build({
...
modelComposition: [
<Here goes your model composition>,
// This is important!
// Otherwise only text will be detected.
{"task":"od","architecture":"yolo","version":"6","interface":"c9","useCase":"default","tags":[]}
]
});
...
Conclusion
With OCR Retraining our base model can be adapted to your specific use-case. Giving you the power to remove flakiness from your AskUI Automations without much time investment.
Also check the docs section for OCR teaching.