Knowledge Base

Article ID: 1238 | Category: Project Setup | Type: How To | Last Modified: 8/27/2015

How can I explicitly specify the Document Definition to be used?

There are cases when the user knows in advance which Document Definition to use for a particular document. In these cases, the user can reduce processing times by manually selecting the right Document Definition instead of having the program go through all the available Document Definitions one by one.

Description

How can I explicitly specify the Document Definition to be used?

If, at the scanning stage, you already know which Document Definition should be used for a given document, you can set up the project to use that particular Document Definition instead of going through all the available Document Definitions in order to  find the right match.

Solution

Using batch types

The easiest approach is to create different batch types within the project and to allow the use of only one Document Definition for each batch type. This can be done on the Recognition tab of the Batch Type Properties dialog box. Select Project -> Batch Types... and click New... to create a new batch type or select an existing batch type and click Edit... In this case the operator will be able to scan documents directly into the batches of the approptiate types.

Using document registration parameters

If there are very many batch types in an ABBYY FlexiCapture project, it may take the scanning operator a long time to select the right type. Besides, you may learn which Document Definition should be used when the scanning is over (but recognition has not started yet).

By default, at the recognition stage, the program goes through all the Document Definitions available for the given batch type in an attempt to find the matching Document Definition. To speed up recognition, you can explicitly tell the program which Document Definitions should be used for which document:

1.  As soon as you know which Document Definition should be used for a particular document, specify its name as the value of the registration parameter of that document.

2.  Before recognition begins, use the value of the registration parameter to narrow down the set of Document Definitions that may be used. This can be done by using a Before Matching event handler or by using the registration parameter fc_Predefined:DefinitionsToMatch, depending on how exactly the Document Definitions will be matched against the documents (see below for detailed explanations).

Writing the name of the Document Definition to the document registration parameter

Option 1. The Document Definition is specified by the scanning operator

The scanning operator can manually specify the value of the registration parameter on the Scanning Station. First, this parameter should be specified in the batch type properties on the Scanning Station (the Registration Parameters tab of the Batch Type Properties dialog box; select Tools-> Batch Types... and click New... to create a new batch type or select and existing batch type and click Edit...).

If the set of Document Definitions to be used should be restricted by the registration parameter fc_Predefined:DefinitionsToMatch, specify this exact name when creating a batch type. If the set of Document Definitions to be used should be restricted by a Before Matching event handler, specify any other name (we use the value DocumentDefinitionName in the examples below). 

Option 2. The Document Definition is specified by a script 

On the Workflow tab of the Batch Type Properties dialog box (Tools-> Batch Types... and click New... to create a new batch type or select an existing batch type and click Edit...), select Advanced in the Schema field and add a stage of type Automatic, when the script should run (you can name the script Document Processing, for example). This stage should follow the scanning stage but it should precede the recognition stage.

At this stage, in the Document Processing script, a DocumentDefinitionName registration parameter is created for each document, with the name of the appropriate Document Definition specified as the value of the parameter. 

Example 1. The Document Definition is selected by means of classification

In this example we assume that a set of pages arrives at this stage (termed one-page documents in ABBYY FlexiCapture). For each one-page document, classification reveals the name of the appropriate Document Definition. If no suitable Document Definitions is found during classification (i.e. if the ClassifyPage returns null), no document registration parameter is created.

C# sample code:

IPageClassificationResult classificationResult = FCTools.ClassifyPage(Document.Pages[0]);
if (classificationResult != null)
{
    Document.Properties.Set("DocumentDefinitionName", classificationResult.MatchedSections);
}

Note: If all of the documents are sent to the recognition stage, this approach will not reduce the time of processing, as it will be effectively the same as ordinary recognition. On the contrary, it will increase the time of processing due to additional routing required for the extra stage. However, processing time will be reduced if only some of the documents are sent to the recognition stage and the rest are deleted or exported unrecognized. 

Example 2. The Document Definition is selected based on document-specific features

In this example we assume that a set of pages arrives at this stage (termed one-page documents in ABBYY FlexiCapture). For each one-page document, the name of the appropriate Document Definition is determined based on the size of the page.

C# sample code:

int a4WidthMax = 3000;
int a4WidthMin = 2500;
int a4HeightMax = 4200;
int a4HeightMin = 3500;
if ((
// Portrait orientation
(Document.Pages[0].Rect.Right<a4WidthMax) && (Document.Pages[0].Rect.Right>a4WidthMin) //width
&&
(Document.Pages[0].Rect.Bottom<a4HeightMax) && (Document.Pages[0].Rect.Bottom>a4HeightMin) //height
) || (
// Album orientation
(Document.Pages[0].Rect.Bottom<a4WidthMax) && (Document.Pages[0].Rect.Bottom>a4WidthMin) //height
&&
(Document.Pages[0].Rect.Right<a4HeightMax) && (Document.Pages[0].Rect.Right>a4HeightMin) //width
))
{
    Document.Properties.Set("DocumentDefinitionName", "DocumentDefinition_forA4");
}
else
{
    Document.Properties.Set("DocumentDefinitionName", "DocumentDefinition_NOTforA4");
}
// Logging
Processing.ReportMessage(Document.Pages[0].Rect.ToString() + ": " + Document.Properties.Get("DocumentDefinitionName"));

Similarly, you can select Document Definitions based on other document-specific features, such as:

Restricting the Set of Document Definitions to be used 

There are two ways to restrict the set of Document Definitions to be used within one batch type:

1.  You can specify the value of the document registration parameter fc_Predefined:DefinitionsToMatch

2.  You can use a Before Matching event handler

The choice depends on the specifics of document processing (see below for detailed explanations). 

Method 1. Using the document registration parameter named fc_Predefined:DefinitionsToMatch 

The value of the document registration parameter fc_Predefined:DefinitionsToMatch is regarded as the set of Document Definitions that may match the document. This value can be specified by listing the names of Document Definition, the names of specific sections, or both, separating the items by semicolons, for example: “Document Definition 1; Document Definition 2\Section 1; Document Definition 2\ Section 2”. The Document Definitions will be matched against the documents as follows:

Method 2. Using a Before Matching event handler

For this sample event handler, we assume that the name of the appropriate Document Definition is known for each document and this name is specified in the DocumentDefinitionName  registration parameter (instructions for assigning this value to the registration parameter are provided below).

On the Event Handlers tab of the Batch Type Properties dialog box (Tools-> Batch Types... and click New... to create a new batch type or select an existing batch type and click Edit...), add a Before Matching event handler, which will run before each document is recognized (i.e. before the program starts looking for the appropriate Document Definition). This event handler uses the value of the DocumentDefinitionName registration parameter to restrict the set of Document Definitions to be matched against the document.

This value can be specified by listing the names of Document Definition, the names of specific sections, or both, separating the items by semicolons, for example: “Document Definition 1; Document Definition 2\Section 1; Document Definition 2\ Section 2”.

This event handler allows you to implement various processing options. Below we describe the two most common options.

Option 1

The event handler changes the set of Document Definitions available for a document only if the document has the DocumentDefinitionName registration parameter.

C# sample code:

if (Document.Properties.Has("DocumentDefinitionName"))
{
    Matching.DefinitionsList = Document.Properties.Get("DocumentDefinitionName");
    if (Document.Properties.Get("DocumentDefinitionName") == string.Empty)
    {
        Matching.ForceMatch = true;
    }
}

Option 2

Unlike in Option 1, here you can specify no more than one section for each document. If you specify a Document Definition that has more than one section or if you list several Document Definitions separating them by a semicolon, the event handler will return an error. This is because a forced match can involve only one section (for a forced match, the ForceMatch, flag is set to true). However, the specified section will be treated as a good match and applied to the document anyway, even if the required data are not found on the image. In this case, their regions can be specified manually. Additionally, a data form will be displayed for this page at the verification stage, where the operator will be able to enter the values of the fields.

C# sample code:

if (Document.Properties.Has("DocumentDefinitionName"))
{
    Matching.DefinitionsList = Document.Properties.Get("DocumentDefinitionName");
    Matching.ForceMatch = true;
}

Using barcodes

Sometimes documents can be supplied with barcodes, whose values can then be used to select the appropriate processing scenario. If, at the scanning stage, you already know which Document Definition should be used for the given document, you can place a barcode on this document. The value of this barcode will then be used to select the appropriate Document Definition.

If the scanning settings specify that barcodes should be used to separate documents, the Index field of each document will contain the value of the corresponding barcode. Barcodes can also be selected as a means of document separation in the batch type properties in ABBYY FlexiCapture (the Image Preprocessing tab of the Batch Type Properties dialog box; select Project -> Batch Types... and click New... to create a new batch type or select an existing batch type and click Edit...) if documents are imported from a Hot Folder or if loose pages are forwarded from the Scanning Station without separating them into documents. 

This value (i.e. Document.Index) can then be accessed when running a Before Matching event handler in the same was as we accessed the registration parameter in the example above. 

If barcodes will not be used for document separation but you want to use barcode values to select Document Definitions, you can do the following:

1. On the Scanning Station, run a script that will find the barcode in each document and assign its value to a certain registration parameter of the document.

2. Use the value of this parameter to process the document in ABBYY FlexiCapture in the same way as described in the previous solution.

If documents are not to be scanned by an operator but are to be loaded automatically as images from a Hot Folder, you can use scripts to perform this scenario automatically on the Scanning Station. Images will be imported from the Hot Folder, barcodes will be detected, and the pages will be sent on for further processing with ABBYY FlexiCapture.

Zyuzin Andrew
Project Manager
592 people think this is helpful.
Was this information helpful to you?