Invoice scanning process: OCR tool

Jun 21, 2014 | Invoice Management

In the area of Accounts Payable it is well-known that the final destination of a provider’s invoice is its recording in the accounts, and all the improvements that we introduce in the provider invoice receipt process is directed to a reduction of the accounting cycle so as to reduce the cost of the process to the maximum possible.

Invoice Automation: The process

Within an automated provider invoice process 4 large phases can be defined:

  • - Invoice validation and matching 2 or 3 lines against purchase order.
  • Invoice approval, for the acceptance of invoice based on company criteria.
  • Exception Resolution, categorization of exceptions for a predetermined resolution.
  • 4. Automatic accounting of invoices.

All these phase can be automated with workflow tools, as we indicated in a previous post, but there is a basic requisite for it to work: the information contained in the invoice must be available. This point does not cause any problem for electronic invoices, as all the basic information is contained in a structure form in the file, but what happens with the paper invoices and those received by email in PDF format?

Invoice Scanning Process: Tools and Functionality

The treatment of the paper invoices, previously scanned, and those in PDF format, require an OCR (Optical Character Recognition) tool for the extraction of the information.

What is an invoice OCR?

The invoice OCR allows us to extract the data from the invoice and use it for the later processing of the invoices.

In easyap we use different first class OCR platforms for processing paper and PDF invoices for outsourced accounts payable invoice processing, and we have been using them for over 12 years giving us great experience in knowing their true potential, limits and dependability.

OCR platform functionality

The OCR platforms are a great help but are still far from being autonomous and “out of the box” solutions, which can function in an unattended form without development and maintenance. Generally, an OCR tool should cover the following steps:

  • 1.  Document classification, in the invoice part invoice identification and separation, and their annexes, should be allowed.
  • 2. OCR, for the recognition of the invoice data. Depending on the technology used, this phase may require more work, or not, prior to the parameterization
  • 3. Manual validation of the invoices, for the correction of data erroneously extracted by OCR. It should be underlined this phase is not optional, as the automatic processing of invoices after OCR without manual validation is not possible given the ratio of errors produced.
  • 4. Quality control, for the Resolution of incidents from previous phases such as: more than one document for invoice, bad-quality images, upside down images…

In relation to invoice OCR platforms, there are two big categories: OCR based on template, and OCR based on key words. The first require a specific template for the provider, indicating for each provider where each field is on the invoice, and the second require the search zones defined and the keywords to localize each one of the fields generically for all provider invoices. Some of these last invoices OCRs can incorporate a learning model which allows the auto-creation of templates based on the data validated manually.

Additionally, they can distinguished by the possibility or lack thereof to capture lines of detail.

The invoice OCR based on templates have as their main strong point a greater level of recognition for each one of the providers, but as a weak point is the need to define the template for each provider. The definition of templates requires dedication and dependence on technical resources. It should serve as a reference that to make a template of a provider takes on average 12 minutes for invoices without lines, and 19 minutes with lines of detail, and that on average, a provider changes some parameter which affects the format of the invoice every 17 months, which requires the template to be redesigned.

The OCRs based on key words have a low level of accuracy, and generate “false positives”when locating the invoice fields, but require less recurring parameterization. The initial configuration, even though it is quick, demands dependency on highly-experienced technical resources.

Invoice OCR: Location of data

In relation to the locating of invoice data and the level of accuracy 4 major groups can be distinguished.

  • Numerical data: the reading of numerical data is quite accurate, especially with all those that can be validated arithmetically and especially in invoices which have taxes. The invoices with various tax rates or invoices from countries where there are no taxes, like VAT or have 0% taxes, are more complex in their OCR capture and for this reason have a higher ratio of errors.
  • Data which can be validated with external sources: such as provider or client tax number. The validation of recognized data with external databases allows for an increase in the accuracy level in automatic mode.
  • - Data with pre-defined format: for example order numbers or dates. It is quite common that order numbers follow a pattern in terms of length and numeric range, and the search of data by localization of specified formats simplifies the procedure and improves the accuracy rate.
  • Fields without structure or pre-defined format: such as invoice number, provider delivery number, office, department, etc… as they do not have pattern to search for, these fields have the greatest number of errors in the automatic recognition.
  • Lines of detail: habitually for invoices with purchase order the extraction of lines in the invoice is required for matching against purchase order and/or entry. This element is the most complex in its detection and the one in which most manual validation is required for error detection. The complexity is such, that not all OCR software can extract lines.

Invoice scanning: OCR

For the processing of invoices with an invoice OCR tool technical/IT resources are required for the configuration/maintenance and adaptation of templates and applications. Also needed are resources with an administrative profile for the tasks of scanning, validation and quality control.

In summary we could conclude that OCR is not a useful tool on its own, as it requires the automation of subsequent steps. Additionally, at a time when the use of electronic invoices is increasing significantly and the use of pdf invoices is becoming more widespread, the investment in scanning and OCR processes of invoices is only justified with a significant volume of invoices.

An amount of paper invoices lower than 250,000 per year does not justify the technological investment (software and hardware) nor technical resources and operatives necessary for the solution’s maintenance and operation.

For paper invoices a model based on outsourcing the whole process, including workflow, guarantees a short implementation period and brings savings of over 30% in the whole process.

Increase your company's productivity and optimize your billing processes

You still have doubts? Contact us at

10 + 12 =