The Bell Curve Document Indexing-Imaging
Greg Krehel
Introduction
Remember the “bell curve” from statistics class? The
bell curve, so named because of its shape, illustrates the frequency distribution of many phenomena,
for example, height. Measure a thousand people. For every person over 7', you’ll have a mob
between 5'6" and 5'10".
Let’s
apply the bell curve to the document collections produced during discovery. Out of every thousand
cases, how many involve 1,000,000+ documents? 100,000+? 10,000? What does this distribution suggest
regarding strategies for imaging and searching documents?
Giant Cases – Special Tools Required
We’re all familiar with cases in which millions of documents are produced during
discovery. But we’ve also seen individuals over 7' tall. Both instances are outliers occurring
infrequently. Out of every thousand cases, only a handful has 1,000,000 or more documents.
Cases
with document collections of over 100,000 are also relatively rare. Do even a hundred cases out of
every thousand involve this many documents? Widespread use of email has dramatically increased the
volume of documents present in many cases, but it hasn’t turned every case into a document
monster.
Dealing
with 1,000,000+ documents or even 100,000+ justifies a substantial investment in scanning and
coding. This type of case also demands sophisticated software tools such as Concordance, iCONECT,
IPRO, Litigator’s Notebook, or Summation to assist with document indexing, image handling, and
more.
So
that’s the story for the giant cases lurking out in one tail of the bell curve. But what about the
cases that populate the rest of the curve? How many documents do these cases involve? What’s an
appropriate image handling and text searching solution for them?
Normal Cases — Perfect For
Adobe Acrobat
Cases with very small document collections fall at the other end of the curve. For every
1,000,000 document case, there’s a case that involves a single red weld of documents. These cases
with only a single folder or box of documents are probably as rare as the ones with massive
quantities of documents.
Which
brings us to the approximately 70% of all cases that fall into the center area of the bell curve. My
experience suggests these cases have between 1,000 and 50,000 documents. A small number relative to
a gargantuan million document case, but still a heap of paper. More documents than any trial team
can memorize the details about. Certainly a document collection that should be imaged and available
in a searchable form.
If
your firm has one of the excellent products mentioned above, it can definitely be put to work on
smaller matters as well. However, another wonderful option to consider on cases with small or
mid-sized document collections is having documents scanned as PDF and using Adobe Acrobat.
There
are numerous reasons Acrobat makes a great choice for a case with a normal size document population.
The fact that the PDF format has become ubiquitous is a benefit in and of itself. You may already
own and be comfortable with Acrobat, perhaps in connection with court-filing requirements. It’s
very likely expert witnesses, other law firms, and even your clients are familiar with PDF files and
have either a full Acrobat license or the free Adobe Reader, making it easy to share case documents.
Why
has the PDF format become the de facto standard for electronic versions of paper documents? The
primary reason is that a single PDF file can contain the images of all pages of the paper document
as well as the associated document text, typically captured by optical character recognition (OCR)
software.
If
you’re new to document imaging, you may be surprised to learn that, prior to the introduction of
the PDF format; the standard way to create electronic versions of paper documents was to generate a
series of single-page TIFF images and a separate OCR text file. Thus, scanning a 15-page document
would yield a total of 16 separate electronic files —15 Tiffs and a text file.
When
scanning first became available, the Many Electronic Files = 1 Paper Document approach was as good
as it got and certainly beat nothing at all. However, with the advent of PDF, which meant that 1
Electronic File = 1 Paper Document, it wasn’t long before PDF ruled the roost.
The
argument for PDF has become even stronger following Adobe’s release of Acrobat 6. This important
new version of Acrobat offers numerous enhancements, including cross-PDF text searching and improved
document mark-up functionality. For example, you can search a folder containing any number of PDF
files and instantly locate those containing any term or phrase.
Here’s
a final tip for any reader who’s yet to experiment with document imaging: Using Acrobat is a great
way to get comfortable using electronic documents without jumping into the deep end of the pool.
Don’t scan every case document until you’re sure it’s worth the effort. Instead, identify the
100 or so most critical documents and have them scanned as PDFs and put in a folder on your network
from which they can be searched. You’ll be able to evaluate the benefits of using electronic
versions of case documents with a minimal investment of time and expense.
When
you have documents produced during discovery imaged, be sure to let the scanning vendor or your
in-house support staff who does the scanning know you want the resulting PDFs to contain both images
and text. If you’re not clear about this requirement, you may get back PDFs that contain only
images and not the associated text of the documents. PDFs that contain only images cannot be
searched.
Conclusion
If you only handle cases with a gazillion documents, Adobe Acrobat isn’t the right answer for
image-handling and text searching. However, for the vast majority of us, Acrobat is a fantastic
solution for some or all cases. If you haven’t put Acrobat to the test, you owe it to yourself to
try it on an upcoming matter.
Copyright 2004 Greg Krehel. All rights reserved.
Greg Krehel is CEO of
Bowne-DecisionQuest’s CaseSoft division. CaseSoft is the developer of the popular software tools
CaseMap, TimeMap, DepPrep, and NoteMap. CaseMap makes it easy to organize and explore the facts, the
cast of characters, and the issues in any case. TimeMap makes it a cinch to create chronology
visuals for use during hearings and trials, client meetings and brainstorming sessions. DepPrep
helps prepare clients for depositions. NoteMap makes it easy to create, edit, and use outlines. In
addition to his background in software development, Mr. Krehel has over 15 years of trial consulting
experience. You can reach him via e-mail (gkrehel@casesoft.com) or telephone (904-273-5000).
|