Aspose.OCR for Python via .NET について

Python用の光学式文字認識API

Aspose.OCR for Python via .NET is a powerful, while easy-to-use optical character recognition (OCR) engine for your Python applications and notebooks. In less than 5 lines of code, you can recognize text in more than 140+ languages, including English, Cyrillic, Arabic, Persian, Hindi, Chinese, Japanese, Korean, Tamil and many more, returning results in the most popular document and data interchange formats. The library works equally well whether it's scanned images, smartphone photos, screenshots, or scanned PDFs, obtain results in popular document and data exchange formats. Leverage pre-processing filters to handle rotated, skewed, and noisy images.

Supported File Formats

Images

  • JPEG
  • PNG
  • TIFF
  • BMP
  • GIF

Batch OCR

  • Multi-page PDF
  • DjVu
  • ZIP
  • Folder

Recognition results

  • Text
  • PDF
  • Microsoft Word
  • Microsoft Excel
  • HTML
  • RTF
  • ePub
  • JSON
  • XML

Features and capabilities

  • Photo OCR - Extract text from smartphone photos with scan-level accuracy.
  • Searchable PDF - Convert any scan into a fully searchable, indexable and editable document.
  • URL recognition - Recognize an image from URL without downloading it locally.
  • Bulk recognition - Read all images from multi-page documents, folders and archives.
  • Any font and style - Identify and recognize text in all popular typefaces and styles.
  • Fine-tune recognition - Adjust every OCR parameter for best recognition results.
  • Spell checker - Improve results by automatically correcting misspelled words.
  • Find text in images - Search for text or regular expression within a set of images.
  • Compare image texts - Compare texts on two images, regardless of the case and layout.
  • 140+ Recognition Languages - The Python and .NET OCR API recognizes text in multilingual documents, such as Chinese/English, Arabic/French, or Cyrillic/English. The following languages are supported:
    • Extended Latin: English, Spanish, French, Indonesian, Portuguese, German, Vietnamese, Turkish, Italian, Polish, and 80+ more.
    • Cyrillic alphabet: Russian, Ukrainian, Kazakh, Bulgarian, including mixed Cyrillic/English texts.
    • Arabic, Persian, Urdu, including texts mixed with English.
    • Chinese, Korean, Japanese, Devanagari, and Dravidian languages, including Hindi, Tamil, Marathi, and others.