PDF Library features text/image extraction capabilities.

Press Release Summary:



Big Faceless PDF Library v2.6.2 enables users to extract text and bitmap images from PDF documents as well as index PDF using Apache Lucene search engine. At rate of 50 pages/sec for large documents, software extracts and indexes text in Unicode from form fields, annotations, and document metadata as well as document body. Areas of use include data mining, content management systems, and form processing environments.



Original Press Release:



BFO adds Text Extraction to PDF Library



London, England, 27 October 2005, - BFO (Big Faceless Organization), a global supplier of java reporting solutions, strengthens the acclaimed Big Faceless PDF Library with the addition of text and image extraction.

The 2.6.2 release adds the ability to extract text and bitmap images from PDF documents, as well as index the PDF using the Apache Lucene search engine. The library extracts and indexes text in Unicode from the form fields, annotations and document metadata as well as the document body, and at roughly 50 pages a second for large documents.

Speed and accuracy of text extraction coupled with the existing features of the PDF Library makes it a wise choice for developers involved in data mining, content management systems and form processing environments. As well as being beneficial in settings that require the ability to search or extract text from large numbers of PDF files.

Text and image extraction requires the Big Faceless PDF Library Extended Edition plus Viewer license, which can be downloaded from BFO's website.

About BFO: BFO is a leading global provider of Java based reporting solutions founded in 1998. They produce a stable of robust Java components for the international B2B market. Such components include Report Generator, Graph and PDF Library. Report Generator comprises both Libraries and converts XML to PDF documents. Using JSP, ASP or similar technology, it is possible to create dynamic PDF reports as quickly and easily as HTML.

All Topics