ThomasNet News Logo
Sign Up | Log In | ThomasNet Home | Promote Your Business

Software creates searchable PDF files.

Print Story
Print | 
email story Email |  Comment Comment   Share facebook twitter reddit StumbleUpon Delicious Digg  
September 26, 2008 - ArchivistaBox 2008/IX web-based DMS can generate searchable PDF files directly from scanned pages. Generated PDF files are stored in Archivista database and automatically indexed, allowing whole document stock to be researched. Sensitive data can be encrypted before being made available. Supporting more than 20 languages, open source solution can handle large volumes of data.

(Archive News Story - Products mentioned in this Archive News Story may or may not be available from the manufacturer.)
Original Press release

Archivista GmbH
Zuerichstrasse 80
Pfaffhausen, ,
Switzerland



Archivistabox 2008/IX: The World's First Open Source Text Recognition with Searchable PDF Files


PFAFFHAUSEN, Switzerland, September 19/ -- With their launch of the ArchivistaBox 2008/IX, Archivista, a Swiss open source software company, has released the only open source text recognition software worldwide that can create searchable PDF files.

The majority of current text recognition or OCR (optical character
recognition) programs run only on Windows systems and can be purchased for prices from around 100 Euro upwards. When, however, thousands or millions of pages are to be processed, then expensive volume licenses, that are based on a price per scanned page, are required.

The ArchivistaBox is a web based DMS (document management system), that can be installed on every commercially available computer. Depending on the hardware used, the page volume processed can vary between several thousand up to several million pages per day.

Release of the 2008/IX marks the launch of the first open source text recognition system that is able to generate searchable PDF files directly from scanned pages. More than 20 languages are available and the recognition quality is comparable with that of commercial systems (>99 percent).

PDF files generated with the ArchivistaBox are stored in an Archivista database and automatically indexed, allowing the whole document stock can be researched. Documents scanned can be called up with a web-browser at any time. Sensitive data can be encrypted before being made available. If required, the ArchivistaBox can create complete DVD publications.

100 % of the source code used in the ArchivistaBox comes under the GPLv2 license. Tesseract (including fracture / black-letter recognition) and the Linux port of Cuneiform (BSD licence) OCR engines are used for text recognition. The hocr2pdf module (see http://www.exactcode.de) is used to generate the searchable PDF files.

The ArchivistaBox 2008/IX CD (700 MByte) can be downloaded from
https://sourceforge.net/projects/archivista/ or http://www.archivista.ch.

Source: Archivista GmbH
Print Story
Print | 
Email |  Comment   Share  
Contacts: View detailed contact information.


 

Post a comment about this story

Name:
E-mail:
(your e-mail address will not be posted)
Comment title:
Comment:
To submit comment, enter the security code shown below and press 'Post Comment'.
 



 See related product stories
More .....
 See more product news in:
Automatic ID
Software
 Tools for you
Watch Company 
Company web site
More news from this company
E-Mail Story
Save Story
Search for suppliers of
Document Management Software
Optical Character Recognition (OCR) & Optical Character Verification (OCV) Software
Web-Based Software
Join the forum discussion at:
Engineers Lounge


Home  |  My ThomasNet News®  |  Industry Market Trends®  |  Submit Release  |  Advertise  |  Contact News  |  About Us
Brought to you by Thomasnet.com        Browse ThomasNet Directory

Copyright © 2013 Thomas Publishing Company. All Rights Reserved.
Terms of Use - Privacy Policy



Error close

Please enter a valid email address