Chris Riley, Independent Consultant

Is there a stack of papers somewhere in your office? When was the last time you touched or searched the stack? Is there information in those documents that requires your attention? We assimilate paper daily and struggle with managing it. The frustration, time cost, and risk of loss that paper poses for everyone is one of the most critical reasons to become paperless.

Only a small percentage of individuals are scanning their documents. Though we all benefit from the technology, it’s sometimes not clear how to create a solution. In my office, a document’s life span from the time I first touch it to the point I destroy it is around 5 minutes. For three years now I scan 98 percent of all paper documents I encounter. This article is a quick crash course on how you can easily create a paperless office.

There are many ways to become paperless. First is by forcing the input of more digital content versus paper. This is usually achieved by paperless bill statements and insistence of email transmissions over fax or physical paper. The disappearance of paper has not happened at the rate the experts expected. Contrary in fact it’s been shown that people used to working with paper print more now than ever before.  Part of becoming paperless is first changing a mind-set, and second and most importantly finding the tools to efficiently digitize paper. What I propose is a tried and true paperless system. There are several variations you can deploy—the bottom line, start scanning, converting, and storing and stop dealing with paper.

What you will need:

  • Computer: Free unmanned computer with installed operating system and attached to your network, does not even need a monitor. This computer will become your digital file cabinet.
  • Search: Desktop search tool that supports browser based searches within a network.
  • Scanner: A document scanner, preferably one with an automatic document feeder and capable of scanning both sides of a page (duplex scanning).
  • Conversion Software: An Optical Character Recognition (OCR) software product that supports Hot Folders sometimes called Watch Folders. OCR is the technology that converts images to usable text, which allows owners to edit and search on the content.
  • Backup: A software backup tool and external hard-drive to periodically save all files from the digital file cabinet to an external backup hard drive.

Preparing your Digital File Cabinet

Once you have the tools, the construction can begin. Typically a digital file cabinet can be setup in just a few hours.

First, install on your digital file cabinet computer your search tool, conversion software, and backup software. The search tool should be setup to index only one folder named something like “File Cabinet” and no other directories. You may choose to have it index the entire machine, but ideally you want to keep it simple and make the computer a stand alone content device. Picture this computer as the replacement of your paper file cabinet where you are accumulating data rich digital files and not paper. The purpose of an autonomous device is that it makes transporting and backup very easy and reliable.

Second, make sure your digital file cabinet computer is visible on the network or the “File Cabinet” folder is visible. When content is saved to your digital file cabinet you will access it via a web browser and your search tool across the network. With most search tools, you use a web browser where you go to a specific URL on your network.  Some tools have a thin software client you access for search.

Third, your backup software should be configured to periodically backup the “File Cabinet” folder to an external hard-drive connected to your digital file cabinet. There are many backup scheduling tools out there, and most operating systems now include one as a standard. You may also choose redundancy with multiple external drives and one that can be moved to remote locations. Ideally you would have three backups, one stored at an external location.

Setting Up Your Document Scanner

Once all the software is installed, attach your document scanner to the digital file cabinet and make sure that it is functional. Your document scanner should be configured to scan directly to a empty directory called “Input” which we will discuss in a moment. As a part of your scan settings, you will be scanning TIFF file format, as this is ideal for conversion. You will want to scan at 300 DPI resolution duplex. These settings are the optimum for OCR conversion, speed, and quality of image. You can choose to scan in color, greyscale, or black and white bit-depth. If you plan to re-purpose a document in the future, you will want to scan minimally with greyscale but probably with color.

Configuring Document Conversion and Automation

The automatic OCR processing product is configured to pick up images as soon as they arrive in the folder your scanner is scanning to, let’s call it “Input”. Because the OCR product has hot folder functionality, as soon as an image arrives in this folder it’s converted. Conversion settings are a personal preference but the two most recommended formats are PDF with search-able text layer or document file in your favorite word processor format. The result of the OCR conversion will be saved into the “File Cabinet” folder where it’s automatically indexed by your search tool. You have now created a digital representation of your file that is 100% search-able.

Access Your New File Cabinet

Your search tool is indexing files as they arrive, meaning that for any computer on the same network, the files are available for retrieval using search terms. For example, if you converted your latest cell phone bill you can search for the carrier name and all PDF or document files with the name will appear. To find the latest bill, simply sort by date and you will have your latest bill at the top of the list.

Ways to make it even more robust

There are several additional settings you can deploy to make the process even more robust. The most common is to separate those documents that are for long time storage not needing attention and those documents needing attention in a set period of time. To do so, leverage your document scanners multiple destination settings.  Most document scanners come with an LED that allows you to choose a number before you scan. Each number is associated with custom settings. For the digital file cabinet, the settings would be an input folder, an OCR watch folder, and a folder that is indexed by the search tool. By using multiple folders, you can scan all documents not requiring action to one folder, and all bills to another. When the bill is paid, put it in your general “File Cabinet” folder. Any combination of work flows can be created depending on the nature of the documents you receive.

The great thing about the digital file cabinet is that once it’s configured, it just runs.  Simply receive the paper documents, scan them, shred them, and search or browse to the proper directory to retrieve them. I understand that in the beginning you may choose to keep the paper copy, knowing there is an unlikely chance that you will have to access it.

Over a period of three years using my digital file cabinet system I’ve yet to lose a document and have re-purposed less than one percent. I’ve accumulated 250,000 plus pages of documents. I was able to achieve this by confirming all scans (one file in—one result out) and having a proper backup procedure in place.

You have now essentially created a basic taxonomy and personal content management system that will reduce the risks and frustrations associated with your paper documents. Be careful once you start document scanning and conversion it gets addictive!

About Chris Riley:
Mr. Riley is an independent industry expert in document recognition (OCR, ICR, OMR, Data Capture) and analytics technologies. Founder of LivingAnalytics, Inc., he lives and breathes technology and has been helping companies buy, use, and optimize these technologies given their business processes and use cases. Chris’s focus on market education has helped expose underlying problems in the acquisition and use of advanced technology and has helped both end-users and vendors mitigate the challenges to achieving ultimate success. Riley has 12 years experience in this arena, owned three software companies, and has received several technology and business awards. He has bachelor’s degrees in Business Administration, Computer Science, and Mathematics, and holds certifications from the enterprise content management trade organization AIIM as “Enterprise Content Management Practitioner (ECMp)” and “Information, Organization, and Access Practitioner (IOAp)”. Mr. Riley is a sought after speaker and educator throughout the content gathering and business intelligence space.