GroupDocs.Parser for .NET について

さまざまな形式のファイルからテキストと書式設定済みテキストを抽出

GroupDocs.Parser for .NET is a text, metadata and image extractor API for business applications developed using C#, ASP.NET, and other .NET technologies. It supports the extraction of raw, formatted and structured text as well as metadata from the files of supported formats. Through GroupDocs.Parser for .NET, your applications can also perform parsing of password protected documents for popular formats, such as Microsoft Word documents, Excel spreadsheets, PowerPoint presentations, OneNote, PDF files and ZIP archives.

Supported file formats

Microsoft Office formats

  • Word: DOCX, DOC, DOCM, DOT, DOTX, DOTM, RTF
  • Excel: XLSX, XLS, XLSM, XLSB, XLTM, XLT, XLTM, XLTX, XLAM, SXC, SpreadsheetML
  • PowerPoint: PPT, PPTX, PPS, PPSX, PPSM, POT, POTM, POTX, PPTM

Images and Other Formats

  • Portable: PDF
  • Images: JPG, BMP, PNG, TIFF, GIF
  • Other office formats: ODT, OTT, OTS, ODS, ODP, OTP, ODG

Other formats

  • Web: HTML, MHTML
  • Archives: ZIP, TAR, 7Z
  • Ebooks: CHM, EPUB, FB2, MOBI

GroupDocs.Parser for .NET features

  • Extract text - Extract textual information from various file formats such as office documents, PDF files and images for easy readability and analysis.
  • Extract images - Retrieve visual content from diverse sources like office documents, PDF files for convenient access and use.
  • Scan QR Codes - Detect and decode QR codes present within office documents, PDF files, or visual content for efficient information retrieval.
  • Extract data from email attachments and archives - Gather valuable information from email messages, file attachments, and compressed data sources for effective analysis and utilization.
  • Extract tables - Identify and extract tabular data from PDF documents for organized analysis and use.
  • Extract hyperlinks - Locate and extract hyperlinks and email addresses within office documents or PDF files for efficient access.
  • Parse PDF Forms - PDF Forms are digital documents featuring fillable fields for user interaction, allowing them to input information electronically. .NET API can be utilized to extract data from these forms for efficient processing.
  • Parse data by templates - Create custom templates and utilize them with .NET API to parse specific information from PDF files, simplifying data extraction processes.
  • Search a text in documents - Quickly locate specific words or patterns within documents.