Classify document templates for ocr extraction

Hi! hope you are having a nice day, i was wondering if there is any way to classify different templates for document data extraction with ocr.

I have 4 different types of invoice that i want to extract data, so i need to find a way to tell the robot wich template need to use to obtain certain data, i dont seem to find any activity for that on the studio.

Any suggestion is welcome, greetings!

Hello Joaco,

I can think different ways of doing it. If you can identify the source of the image file like file name, or email sender if you receiving the invoice through email that can point you to the invoice type, I would use it to determine which template to use.

In case you don’t know the source, I would suggest to use first a generic template that check for different positions to get a key value that will identify which template to use. For instance, use one template that first identify where the “invoice id” is located in the 4 different invoice types and save that value to a variable (varInvoice1, varInvoice2…) and then check which variable has the right value.

Hope this helps you!

1 Like

Hi @joacobracci!
You can also use ready-made templates from Klippa, Nanonets, or create your own pattern recognition by masks, keywords, found by coordnates in areas or in an array. Then you can create a condition for different templates, depending on the presence of keywords or masks.

Thank you for your reply!
I find your solution very interesting,
there is an activity or something already made for pattern recognition , or keywords on the studio ? or i should create that solution by scratch?

ty again for replying!

Hi Diego thank you for your reply!,
i should try that

ty again for replying!

@joacobracci You can use activity “Text found?” for strings or JS method array.includes(‘keyword’) to find objects in array.