Knowledge Base

Article ID: 1294 | Category: Project Setup | Type: How To | Last Modified: 8/22/2013

Speeding Up the Selection of Document Definitions

Available downloads

Description

In many projects that involve processing invoices the majority of the invoices come from several sources, while the remaining invoices come from many various sources. In other words, a few sources issue invoices frequently, while invoices from a large number of other sources arrive rarely. The best solution in this type of situation is to create separate Document Definitions for invoices from sources that issue them frequently, and use a Generic Layout for the rest of the invoices. However, the number of sources that issue invoices may also turn out to be quite large (around 100 sources), and selecting the correct Document Definition for these sources becomes a time-consuming process. Is there a way to speed up this process?

Solution

When ABBYY FlexiCapture (FC) process an image, a Document Definition is applied to the image before it is recognized. If the project does not have a classifier, the system will apply every available Document Definition and select the one that yielded the highest degree of confidence when it was applied. This process can take a significant amount of time if there are many Document Definitions in the project.
The Document Definition selection process can be speed up by adding a classifier that is trained to identify classes which correspond to Document Definitions from the project. In this kind of setup documents will be classified before a Document Definition is applied. ABBYY FlexiCapture will then attempt to apply only those Document Definitions that satisfy the classification results. The exclusion of the other Document Definitions from this process results in a boost to performance.

Step-by-step instructions on setting up a classifier in FlexiLayout Studio (FLS) and a project that uses this classifier in FC are provided below.

1)     Create a classifier in FlexiLayout Studio

  

2)     Set up the FlexiCapture project

There is a project example for this scenario in the SampleProject.rar archive. This archive also contains a sample of the kind of classifier we used and sample invoices.

We would like to draw your attention to the importance of creating a separate document classification stage and storing classification results in the registration parameters of documents. The values of these parameters were later used in the script triggered by the Before matching event. Even though this causes the classification stage to take more time to complete, this delay isn't very significant thanks to the image-based classifier used in the project, and is completely offset by significant gains in template matching accuracy from using this routing setup. If you use this setup to process the sample batch from the SampleProject.rar archive, you will notice that there won't be any document definition errors. But if you disable the classification stage and the Before matching script, the wrong Document Definition will be applied to 7 pages.

538 people think this is helpful.
Was this information helpful to you?