docWorks is a software program used by the most renowned libraries, publishing houses, and companies worldwide to digitize and convert their valuable library holdings and archives for easy access, searchability, and long-term preservation.
The term “Digital Library” falls a little short of comprehensive in describing the actual result. “Digital” means the transformation of printed documents to digital images—i.e., by scanning pages and receiving JPGs, TIFFs, or PDFs. However, to create a truly searchable digital library, these images need to be “converted” into intelligent units using OCR (text recognition) and zoning (e.g., the identification of different articles on a newspaper page). docWorks is the only software that bundles all necessary conversion steps in a single, smooth workflow. Various docWorks editions guarantee a custom-fit solution for your project, be it a small collection or a National Library.
Advantages of a Digital Library
– Easy access to a worldwide audience
– Much better, faster and more comprehensive search
– Foundation for long-term preservation
– Holdings are available for second-cycle exploitation
– Single, smooth workflow with central control center
– Time savings due to streamlined and automated process
– No expensive errors from false copying or lost data shipments
– Consistent, standardized output
– Easily upscalable, from thousands to millions of pages
How does docWorks work
docWorks “converts” scanned images. More specifically, docWorks identifies the information contained in scanned pages (such as text and structure), saves this information in an XML file, and adds this file to the image. The two essential conversion steps are OCR and segmentation of the document by logical units (articles, chapters, etc.). Only through OCR can a scanned page be searchable, and zoning and structure recognition ensure that only relevant search results are displayed. For instance, if no zoning/structure is applied, a multiple-word search within newspapers might display thousands of results because the single search words are being found throughout an entire newspaper page. Segmentation of the page by its different articles will ensure that the search words are found in the same article.
The conversion process runs through different steps: Following import, the scanned images are “cropped,” meaning they are cut to a consistent size. This step is followed by zoning (segmentation of the page by classified blocks and columns) and the editing of structure (paragraph, chapter, article), text correction, and metadata. Next comes a standardized output of the data in METS/ALTO files, which are stored in the archive and fed into the presentation system. Every workflow step consists of an automatic analysis executed by docWorks and a manual correction. This correction can be done by the docWorks user, or it can be outsourced to specialized service partners. Important: If the correction is outsourced, the service partner receives a special online access that allows him/her to process the data from anywhere in the world. The actual data always remains with the docWorks client.
Presentation systems and app
We work with a selection of systems that help you to best present your Digital Library online. Besides the standard website solution, we have developed a special app that provides a whole new way to experience digital publications. The app offers thumbnail preview, automatically created table of contents, and text search. Table of contents and search results link directly to their respective sources, highlighted in color. Thanks to the gesture control of tablets, you can leaf through digital publications as you would an actual book. This allows you to present valuable content in a simple, modern, and excitingly graphical way to an interested audience — or you can use the app as an internal communications tool to inform colleagues and stakeholders about the advantages and progress of your digitization efforts. The app is designed to grow continuously through the addition of topics, pictures, texts, or whole publications.
Please visit Content Conversion Specialists, the makers of docWorks for more information.