Tuesday, March 1, 2011

Google Docs OCR Now Recognizes 34 Languages

Optical Character Recognition in foreign languages is hard to come by (Cuneiform OCR recognizes 23 languages, but that’s the only software I have come across). So this is a wonderful news that Google Docs added support for 34 languages to the OCR functionality of the online office suite.

Optical Character Recognition, if you might remember, was introduced in Google Docs in June last year. OCR analyzes images and PDF files, typically produced by a scanner or even with a camera of a mobile phone, extracts text and some formatting and allows you to edit the document in Google Docs.

Until today, Google Docs could recognize 5 languages. 29 more were added taking the total to 34. Below is the full list of languages that Google Docs OCR currently supports:

  • Bulgarian
  • Catalan
  • Chinese (Simplified Han
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Filipino
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Lithuanian
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

So how do you use it? Simple. When uploading your images and PDF files using Google Docs, just tell it what language your documents are in. Also check the box “Convert text from PDF or image files to Google Docs documents”.


Related: Scan Tailor - Post-processing tool for scanned pages


  1. Last week I purchased SmartOCR and I sincerely recommend it to anyone looking for accurate OCR for a low price: http://smartocr.com

  2. hi I'm student study at computer sciences
    (Lao PDR) and i interesting of OCR to develop for my language can you help me what to do ?
    pleas tell me step to make it .


Popular Posts