Skip to content

Intelligent Document ProcessingΒΆ

IDP SourceΒΆ

Introduction to IDP SourceΒΆ

IDP Source is configured to utilize the IDP Template that is used for manipulating text data that is in image format.

Google Vision and Azure Form Recognizer are based on the provisioned license, the IDP Source configuration entity will not appear on the Platform Management > Configuration Management.

IDP Source uses Google Vision or Form Recognizer to manipulate the image format of structured data.

IDP recognizes text from the image using Google Vision or Form Recognizer as per your configuration. IDP Source is mapped in the Google Vision / Form Recognizer activity in the process flows.

Creating and Configuring IDP SourceΒΆ

  • If you are going to use the Google Vision feature in the IDP Source, you should create the input JSON file in your Google Vision account. This JSON file is uploaded in the Google Vision configuration in the Platform.

  • If you are going to use the Azure Form Recognizer feature in the IDP Source, you should create a project in the Azure Form Recognizer, from which you can generate the Endpoint URL and Secret Key. These details are used in the Azure Form Recognizer configuration in the Platform.

Follow the below steps for creating a new IDP Source.

  1. Click the Burger menu and navigate to Management > Configuration Management.
  2. Click IDP Source in the configuration entity panel.

    Creating new IDP Source
    Figure 1: Creating new IDP Source

  3. Click +Create New.

  4. Enter the Basic details in the Create IDP Source panel as explained below.

    Field Description
    Name* Enter the name of the IDP Source.
    Character limit: 50.
    Data type: Alphanumeric and underscore.
    Is Default Check the checkbox adjacent to Is Default, if you need
    to set the IDP Source as a Default IDP Source. If there is
    any other default IDP Source it will be set back as non-default.
    - You can configure only one IDP Source as the default IDP Source.
    - If you set an IDP Source as default, the previously existing default
    IDP Source will change to non-default automatically.
    - Uncheck the checkbox if you do not need the IDP Source as default.
    Source Type* Select the required source type from the drop-down.
    Source Type
    - Google Vision: Allows you to read the input image file using the
    Google Vision features.
    - To utilize Google Vision feature, you should create a Google Vision
    account and then generate a JSON key in that account. You need to
    download the key of type JSON generated for uploading on the IDP source.
    Refer to Creating JSON File in Google Vision for creating JSON key.
    - When you select Google Vision, Upload Key File (JSON) appears.
    Google Vision Key File
    - Upload the JSON key File of type JSON that you generated from
    your Google Vision account.
    Once you upload the JSON, Google vision account connection is
    established in the platform.
    - AzureFormRecognizer: Allows you to read the input image file using the
    Azure Form Recognizer features.
    - For utilizing Azure Form Recognizer feature, you should create a project in
    the Azure account and then create an Endpoint URL and Security Key using
    that project name.
    Refer to Generating Endpoint URL and Secret Key in Azure for details on
    creating the Endpoint URL.
    - When you select AzureFormRecognizer, EndPoint, and Secret Key fields appear
    in the configuration section.
    Azure Form Recognizer
    - Enter the Endpoint URL and the Secret Key in the respective fields.
    Description Write a brief description of the IDP Source.
    Character limit: 1000 characters
    Data type: Alphanumeric and symbols.
  5. Click Create on the bottom right of the page and the IDP Source gets created with the details entered.

Viewing and Editing IDP sourceΒΆ

  1. Click the Burger menu and navigate to Management > Configuration Management > IDP Source.
  2. Click the IDP Source card to view the details of the selected IDP Source. The details of the IDP Source appear in the Info Actions panel (Edit IDP Source).

    Editing IDP Source details
    Figure 2: Editing IDP Source details

  3. Edit the IDP Source details as needed.

  4. Click Save.

Duplicating IDP SourceΒΆ

Follow the below steps for duplicating an existing IDP Source.

  1. Click the Burger menu and navigate to Management > Configuration Management.
  2. Click IDP Source. The list of all IDP Sources is displayed.

  3. Hover over any IDP Source card. Three dots appear on the upper right side of the card.

  4. Click the three dots. More Actions appear.

    Duplicating the IDP Source
    Figure 3: Duplicating the IDP Source

  5. Click Duplicate. A confirmation pop-up appears.

    Duplicate confirmation
    Figure 4: Duplicate confirmation

  6. Click Ok for duplicating the IDP Source (or you can click Cancel to cancel the duplicate action). A Success message appears on the successful duplication of the IDP Source.

    Duplicate success message
    Figure 5: Duplicate success message

  7. Click Ok. A duplicate copy of the IDP Source appears on the IDP Source page with the same IDP Source name suffixed with β€œ_copied”.

    Duplicated IDP Source
    Figure 6: Duplicated IDP Source

Deleting IDP SourceΒΆ

  1. Click the Burger menu and navigate to Management > Configuration Management.
  2. Click IDP Source. The list of all IDP Sources is displayed.

  3. Click the IDP Source name card that is to be deleted. The lower-right of the page displays the Delete button.

    Deleting IDP Source
    Figure 7: Deleting IDP Source

  4. Click Delete. A Confirmation pop-up for delete appears.

    IDP Source Delete confirmation
    Figure 8: IDP Source Delete confirmation

  5. Click Ok for deleting the IDP Source.

    Or

    Click Cancel to cancel the action.

Alternatively, you can follow the below steps to delete the IDP Source:

  1. Click the Burger menu and navigate to Management > Configuration Management > IDP Source.
  2. Hover over the IDP Source card. Three dots appear on the upper right side of the card.

  3. Click the three dots. More Actions appear.

  4. Click Delete and follow step 5 in the above procedure.

    Delete action in More Actions
    Figure 9: Delete action in More Actions

IDP TemplateΒΆ

Introduction to IDP TemplateΒΆ

IDP Template is used to upload a template file. During process execution, the activity associated with the IDP Template extracts text from the input file as per the annotations done during the creation of the IDP template. In the input file, the details or fields that is to be extracted should be positioned correctly in those coordinates where you have annotated in the template as the details are extracted from the exact coordinates of the input file.

IDP Template is mapped in the IDP (Tesseract), Google Vision (licensed separately), and Form Recognizer (licensed separately) activities in the process flows.

You can view the IDP Template configurations in Management > Configuration Management > IDP Template section.

Creating and Configuring IDP TemplateΒΆ

The required annotations need to be configured in the IDP template with proper identifiers. Follow the below steps to create an IDP template.

  1. Click the Burger menu and navigate to Management > Configuration Management.
  2. Click IDP Template in the configuration entity panel.

    Creating a new IDP Template
    Figure 10: Creating a new IDP Template

  3. Click + Create New.

  4. Enter the Basic details in the Create IDP Template panel as explained below.

    Field Description
    Name

    Enter the name of the IDP Template.

    Character limit: 50.

    Data type: Alphanumeric and underscore.

    Upload File

    Click and select (or drag) the sample template for IDP.

    The File format can be PDF, JPEG, or TIFF.

    The file format selected here should be uploaded in the
    template.

    Description

    Write a brief description of the IDP Template.

    Character limit: 1000 characters

    Data type: Alphanumeric and symbols.

  5. Click More. The uploaded file appears. You can upload multiple pages.

    Uploaded file displayed for the annotation
    Figure 11: Uploaded file displayed for the annotation

  6. Annotate the required details. The Thumbnail on the left side allows you to navigate to pages and annotate as needed. Without annotating the file, you cannot save the template. That is, at least one annotation should be done.
    Follow the below steps for annotating the uploaded file.

  7. Click and drag to select a particular area in the file where the input data is supposed to appear. Drag the box to a length of how much you expect the maximum data length to appear. An annotation box appears prompting you to enter the Identifier Name.

    Annotating the input file
    Figure 12: Annotating the input file

  8. Enter the identifier name for the selected area and click OK. You can drag and move around the annotation box.

  9. Click the annotation box again for editing the annotation. After annotating the content, a delete icon appears and you can click the delete icon for deleting the annotation.

    Details in the annotated file
    Figure 13: Details in the annotated file

  10. Click Create on the bottom right of the page and the IDP Template gets created.

During the runtime, content is extracted or read from the input file from those positions marked or annotated in this template. The coordinates in the template are mapped with the input file and this extracted data is stored in the variable.

Viewing and Editing IDP TemplateΒΆ

  1. Click the Burger menu and navigate to Management > Configuration Management > IDP Template.
  2. Click the IDP Template card to view the details of the selected IDP Template. The details of the IDP Template appear in the Info Actions panel (Edit IDP Template).

  3. Click More to view the annotated details.

    Editing IDP Template details
    Figure 14: Editing IDP Template details

  4. Edit the IDP Template details as needed.

  5. Click Save.

Duplicating IDP TemplateΒΆ

Follow the below steps for duplicating an existing IDP Template.

  1. Click the Burger menu and navigate to Management > Configuration Management.
  2. Click IDP Template. The list of all IDP Templates is displayed.

  3. Hover over any IDP Template card. Three dots appear on the upper right side of the card.

  4. Click the three dots. More Actions appear.

    Duplicating the IDP Template
    Figure 15: Duplicating the IDP Template

  5. Click Duplicate. A confirmation pop-up appears.

    Duplicate confirmation
    Figure 16: Duplicate confirmation

  6. Click Ok for duplicating the IDP Template (or you can click Cancel to cancel the duplicate action). A Success message appears on the successful duplication of the IDP Template.

    Duplicate success message
    Figure 17: Duplicate success message

  7. Click Ok. A duplicate copy of the IDP Template appears on the IDP Template page with the same IDP Template name suffixed with β€œ_copied”.

    Duplicated IDP Template
    Figure 18: Duplicated IDP Template

Deleting IDP TemplateΒΆ

  1. Click the Burger menu and navigate to Management > Configuration Management.
  2. Click IDP Template. The list of all IDP Templates is displayed.

  3. Click the IDP Templates name card that is to be deleted. The lower-right of the page displays the Delete button.

    Deleting IDP Template
    Figure 19: Deleting IDP Template

  4. Click Delete. A Confirmation pop-up for delete appears.

    IDP Template Delete confirmation
    Figure 20: IDP Template Delete confirmation

  5. Click Ok for deleting the IDP Template.

    Or

    Click Cancel to cancel the action.

Alternatively, you can follow the below steps to delete the IDP Template:

  1. Click the Burger menu and navigate to Management > Configuration Management > IDP Template.
  2. Hover over the IDP Template card. Three dots appear on the upper right side of the card.

  3. Click the three dots. More Actions appear.

  4. Click Delete and follow step 5 in the above procedure.

     Delete action in More Actions
    Figure 21: Delete action in More Actions

Platform IDP Process Flow ActivitiesΒΆ

The IDP activities in the process flow module are used to extract the required information from a non-text-based file. The IDP activities extract the text images and convert them into an editable format.

Platform IDPΒΆ

IDP is an inbuilt IDP activity that is available in the Platform. For configuring an IDP activity in a process flow you need to configure the IDP Template and the document variable to which the file will be uploaded.

Refer to Platform IDP for details.

Google VisionΒΆ

Google Vision activity is an advanced activity available as per your organization's license.

Before configuring the Google Vision activity in the process flows, you should create an IDP Source with Google Vision as the Source Type and an IDP Template.

Configuring Google Vision ActivityΒΆ

For using the Google Vision activity, you need to add the activity to the activities list and then configure the same in the process flow.

The Google Vision activity allows you to extract characters from a non-text-based file using the Google Vision features.

Google Vision Activity

Properties: Basic, Google Vision, Error Handling

Dependent entities: Variables.

Dependent configuration(s): IDP Source, IDP Template.

  1. Navigate through App Studio > Process Flows.

  2. In the Process Design > Activities panel click β€œ+” to view the advanced connectors.

  3. Click Google Vision activity. The activity gets added to the Activities panel.

    Google vision activity

  4. Drag the Google Vision activity into the design canvas. The properties panel displays the configuration details of the activity.

    Google Vision property
    Figure 22: Google Vision property

  5. Click Basic accordion.

  6. Provide a Name and Description for the activity.

    Details

  7. Click Google Vision accordion.

    Google Vision

  8. Click Select Google Vision Account Key and select the IDP Source from the drop-down. The IDP Sources (that is configured in the Management > Configuration Management > IDP Source) with source type as Google Vision are displayed for selection.

    Select Google Vision Account Key

  9. Select the Document Type from the drop-down. IDP accepts PDF, JPG, JPEG, and TIFF file formats. The input file is converted into a JPEG file.

  10. If the input document file is stored in a document variable,

    1. Check Is Document Variable = True.

    2. Select the Document variable from the Document Variable drop-down list.

      Else If the input document file is stored in your local disk or directory,

    3. Uncheck Is Document Variable = False

    4. In the Local Directory field, enter the local path and file name of the document file where it is residing in the server.

    5. If the input is structured data, that is structure is configured in the IDP template, check Template= True. Go to step 13.

    Else If the input file is stored in a document variable, uncheck Template = False. Go to step 15.

  11. Select the matching template for the input file from the Select IDP Template drop-down. This template is created in the Management > Configuration Management section.

  12. Select Attribute Variables for each of the Input Attributes displayed as per the template selected. The attributes are based on how you configure the IDP template.

    Input attributes and variable mapping
    Figure 23: Input attributes and variable mapping

  13. If the input file is stored in the document variable, select an alphanumeric variable from the Output Variable drop-down. The output variable is of the JSON format. This is used to extract the complete content of a file with multiple pages.

    Output Variable

    Sample JSON Output format: [{"pageNumber": 1,"extractedSource": page1.jpg","extractedcontent":"Hello World"}]. Extracted Source indicates the path of the input file that is been converted into JPEG format.

  14. Click Error Handling accordion.

    Error Handling
    Figure 24: Error Handling

  15. Check the Continue If Error check box if you want to continue the workflow even if the error occurs.

    Else uncheck the checkbox to break or stop the workflow execution when an error occurs.

  16. Select a variable from the Error Variable drop-down. The variable should be of data type, alphanumeric. Error variable stores the first line of the error.

Form RecognizerΒΆ

Azure Form Recognizer activity is an advanced activity available as per your organization license.

Before configuring the Form Recognizer activity in the process flows, you should create an IDP Source with Form Recognizer as the Source Type and an IDP Template.

Configuring Form Recognizer ActivityΒΆ

For using the Form Recognizer activity, you need to add the activity to the activities list and then configure the same in the process flow.

The Form Recognizer activity allows you to extract characters from a non-text-based file using the Azure Form Recognizer features.

Form Recognizer Activity

Properties: Basic, Form Recognizser, Error Handling

Dependent entities: Variables.

Dependent configuration(s): IDP Source, IDP Template.

  1. Navigate through App Studio > Process Flows.

  2. In the Process Design > Activities panel click β€œ+” to view the advanced connectors.

  3. Click Form Recognizer activity. The activity gets added to the Activities panel.

    Form Recognizer

  4. Drag the Form Recognizer activity into the design canvas. The properties panel displays the configuration details of the activity.

    Form Recognizer Properties
    Figure 25: Form Recognizer Properties

  5. Click Basic accordion.

  6. Provide a Name and Description for the activity.

    Details

  7. Click IDP accordion.

    IDP

  8. Click Select Form Recognizer Account and select the IDP Source from the drop-down. The IDP Sources (that is configured in the Management > Configuration Management > IDP Source) with source type as Form Recognizer are displayed for selection.

    Select Form Recognizer Account

  9. Select the Document Type from the drop-down. IDP accepts PDF, JPG, JPEG, and TIFF file formats. The input file is converted into JPEG file.

  10. If the input document file is stored in a document variable,

    1. Check Is Document Variable = True.
    2. Select the Document variable from the Document Variable drop-down list.

      Else If the input document file is stored in your local disk or directory,

    3. Uncheck Is Document Variable = False

    4. In the Local Directory field, enter the local path and file name of the document file where it is residing in the server.
    5. If the input is structured data, that is structure is configured in the IDP template, check Template= True. Go to step 13.

    Else If the input file is stored in a document variable, uncheck Template = False. Go to step 15.

  11. Select the matching template for the input file from the Select IDP Template drop-down. This template is created in the Management > Configuration Management section.

  12. Select Attribute Variables for each of the Input Attributes displayed as per the template selected. The attributes are based on how you configure the IDP template.

    Input attributes and variable mapping
    Figure 26: Input attributes and variable mapping

  13. If the input file is stored in the document variable, select an alphanumeric variable from the Output Variable drop-down. The output variable is of the JSON format. This is used to extract the complete content of a file with multiple pages.

    Output Variable

    Sample JSON Output format: [{"pageNumber": 1,"extractedSource": page1.jpg","extractedcontent":"Hello World"}]. Extracted Source indicates the path of the input file that is been converted into JPEG format.

  14. Click Error Handling accordion.

    Error Handling
    Figure 27: Error Handling

  15. Check the Continue If Error check box if you want to continue the workflow even if the error occurs.

    Else uncheck the checkbox to break or stop the workflow execution when an error occurs.

  16. Select a variable from the Error Variable drop-down. The variable should be of data type, alphanumeric. Error variable stores the first line of the error.

IDP Activities and Confidence ScoreΒΆ

The confidence score for the IDP extracted content is the percentage of correctness that the system considers after converting the extracted content to editable text format.

If you want to display the confidence score for extracted content, create another field in your Form UI and provide the same variable name (Property name of the Text Field component in the Form UI) as the extracted content Text Field suffixed with underscore confidence. That is, if FormA.TotalAmount is the property name of the extracted content, then the property name for the confidence score field should be FormA.TotalAmount_confidence. After the execution of the corresponding IDP activity, the confidence score is displayed in the respective text field of the form.