Cognitive OpenOCR (CuneiForm) is an open source Optical Character Recognition (OCR) software that can automatically recognize texts in scanned or printed documents in not one or two but twenty-three international languages. These include English, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, French, German, Hungarian, Italian, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, mixed Russian-English, Spanish, Swedish, Serbian, Slovenian, Turkish, and Ukrainian.
Cuneiform can recognize any printing and typing styles with the exception of decorative and manuscripts. The software has special algorithms for text recognition from dotted matrix printer, fax and photocopies of bad typing. It auto recognizes blocks of text, tables and images and preserves the layout of the page perfectly.
Cuneiform has a high recognition rate. Its performance is comparable to that of high-quality commercial software such as ABBYY FineReader. In fact, in my test I wasn’t able to figure out who did the better job - ABBYY FineReader or Cuneiform.
Cuneiform does everything pretty much automatically. A wizard guides you through all stages of scanning and recognition and helps you reach the goal quickly. Just keep feeding the software scanned copies of text or direct it to get the images from the scanner and it will do the rest. The software also grants you the ability to define regions for scanning, regions to ignore, define tables, columns and so on, you wish to.
After it is done recognizing the text, a curious thing happens – it opens Microsoft Word right inside the program’s window for you to carry out editing and proofreading. If your computer does not have MS Word installed, the document will open on their inbuilt text editor which is by no means inferior to standard word processors.
- Quality recognition
- High speed
- Recognition of texts in 23 languages
- Rotate and invert images before recognition
- Recognition of tables of any structure and complexity
- Automatic saving of illustrations and tables in the received output document
- Complete preservation of the topology of the page
- Support batch mode scanning and recognition
- Built-in text editor to work with the recognized text