Optical Character Recognition accuracy and speed has increased dramatically in recent years and presents a viable and cost effective means of automatically indexing documents. Service Point offer various solutions depending on the customers requirement and the accuracy of recognition required.
OCR can be used for various different purposes, for instance
it maybe that a client is performing a large backfile conversion project of
engineering documents and wishes to index their drawings according to a description,
drawing number and issue. In this instance a method known as zone OCR would
most likely be utilised. As the name suggests the OCR system would be configured
to look at various zones on the document and the information would then be
extracted, recognised, verified and used to populate a database.
It is imperative in this instance for such an operation that
the accuracy of recognition is greater than 80% as the time to manual correct
any incorrect fields would require a great deal of time. Also for zone OCR
operations of this type we would never recommend attempting to OCR less than
2 data fields, the reason being is that it is far easier to verify mistakes
when using 2 fields, as a mistake in 1 field easily falls through the net.
If this data can also be verified against an existing database to ensure data
integrity the OCR process is greatly enhanced as the accuracy can be automatically
verified.
Another example where OCR accuracy need not be so accurate would be the conversion of large swathes of text (i.e. historical books or manuscripts). In this example Service Point might employ a method of scanning and OCR'ing the complete text without paying a great deal of attention to the accuracy of the data, to do so would be cost prohibitive for a very large project.
We would then produce the documents as "PDF on text". This method
ensures that the original look and integrity of the data is kept whilst enabling
the document to be searched due to the underlying text within the PDF file.
The 80% accuracy rule does not apply in this case as the term being searched
is;