GroupDocs.Search for Java について

文書の検索とインデックス付け

GroupDocs.Search for Java enables users to perform text search functions. You can create and merge multiple indexes and use simple, boolean, regular expression (Regex), fuzzy and other types of queries to search through indexes. You can fetch information from files, documents, emails, and archives, as GroupDocs.Search for Java supports all popular file formats.

Supported File Formats

Microsoft Office Formats

  • Word: DOC, DOCX, DOCM, DOT, DOTX, DOTM
  • Excel: XLS, XLSX, XLSM, XLT, XLTX, XLTM, XLSB, XLA, XLAM, CSV, TSV
  • PowerPoint: PPT, PPTX, POT, POTX, PPS, PPSX, PPTM, PPSM, POTM
  • Project: MPP
  • Diagram: VSD, VSS
  • Microsoft Compiled HTML: CHM
  • OneNote: ONE

OpenDocument and Other Formats

  • Portable Document Format: PDF
  • OpenDocument: ODT, OTT, OTS, ODS, ODP
  • Email: PST, OST, MSG, EML, EMLX
  • Web File Formats: XML, HTM, HTML, XHTML, MHT, MHTML
  • Audio: MP3, WAV
  • Video: AVI, MOV, QT, FLV, ASF
  • Text: TXT
  • Rich Text Format: RTF
  • Markdown Documentation File: MD
  • Images: BMP, GIF, JP2, PNG, WEBP, TIFF, EMF, WMF, JPG, PSD
  • Other document formats: TORRENT, ZIP, DCM, DJVU, EPUB, FB2

GroupDocs.Search for Java Features

  • Build an index on disk or in memory.
  • Selectively skip indexing for specific files.
  • Async indexing in multi-threads.
  • Save space using compact indexing.
  • Fetch lists of indexed archived files.
  • Document text extraction from index or source file.
  • Support for Regular Expression (Regex) searching.
  • Perform search operations.
  • Configure similarity level for fuzzy searching.
  • Configure fuzzy search to show best results only.
  • Use faceted and boolean search simultaneously.
  • Configure and perform synonyms search.
  • Index password secured documents.
  • Index email messages from Outlook.
  • Skip specific words to index faster.
  • Spell check search queries.
  • Use search phrases with wild cards.
  • Make single object tree by combining multiple queries.
  • Divide search in smaller chunks to rapidly search huge indexes.
  • Index Documents from Streams and Data Structures.
  • Set up document filtering in search results.
  • Add English synonyms to default synonym dictionary.
  • Enable exact number of occurrences for each found word to offer alternative word suggestions in case of misspelling.
  • Add text attributes to indexed documents without re-indexing.
  • Perform indexing and searching operations based on characters.
  • Index metadata of non-textual document formats.