Aspose.OCR for Java について

JavaアプリケーションにOCRとOMR機能を追加

Aspose.OCR for Java is a character and optical mark recognition API that allows developers to add OCR and OMR functionality in their Java based applications. It allows extracting text from images having different fonts and styles - saving the time and effort involved in developing an OCR solution from scratch.

Supported File Formats

Images

  • PDF
  • JPEG
  • PNG
  • TIFF
  • GIF
  • Bitmap

Batch OCR

  • Multi-page PDF
  • ZIP
  • Folder

Recognition results

  • Text
  • PDF
  • Microsoft Word
  • Microsoft Excel
  • HTML
  • RTF
  • ePub
  • JSON
  • XML

Features and Capabilities

  • Photo OCR - Extract text from smartphone photos with scan-level accuracy.
  • Searchable PDF - Convert any scan into a fully searchable and editable document.
  • URL recognition - Recognize an image from URL without downloading it locally.
  • Bulk recognition - Read all images from multi-page documents, folders and archives.
  • Any font and style - Identify and recognize text in all popular typefaces and styles.
  • Fine-tune recognition - Adjust every OCR parameter for best recognition results.
  • Spell checker - Improve results by automatically correcting misspelled words.
  • Find text in images - Search for text or regular expression within a set of images.
  • Compare image texts - Compare texts on two images, regardless of the case and layout.
  • Worldwide - Extract text of any language with automatic language detection.
  • Key detail extraction - Automatically extract important details from ID cards.
  • Full Integration with Aspose Ecosystem - Integrate OCR seamlessly with other Aspose products for a comprehensive and efficient Java solution.
  • 140+ Recognition Languages - The Java OCR API recognizes text in multilingual documents, such as Chinese/English, Arabic/French, or Cyrillic/English. The following languages are supported:
    • Extended Latin: English, Spanish, French, Indonesian, Portuguese, German, Vietnamese, Turkish, Italian, Polish, and 80+ more.
    • Cyrillic alphabet: Russian, Ukrainian, Kazakh, Bulgarian, including mixed Cyrillic/English texts.
    • Arabic, Persian, Urdu, including texts mixed with English.
    • Chinese, Korean, Japanese, Devanagari, and Dravidian languages, including Hindi, Tamil, Marathi, and others.