We all struggle with the rising cost of e-discovery. What used to be a simple process of delivering a few boxes of paper for a production has turned into a tide that sometimes breaks the bank of litigation. Today, merely mentioning the word “e-discovery” elicits thoughts of settlement. E-discovery and the entire production process is often mired in technical mumbo jumbo, geek speak, frustration, cost overruns and fear.
Why have we come to this place in the world of litigation? We can blame many factors, including the exponential growth of data, the diversity of data sources, the complexities of collecting custodian data, vendor pricing or even the cloud. Let’s first look at the history of how we arrived at the present.
In the good old days (as some may remember them), we used a photocopier, a staple remover and some colored slips of paper to separate documents. We performed a process called “blowback” if there was anything electronic, but in the end, it all got reduced to good old ink on tree bark – and delivered to opposing counsel. We carried it around in bankers boxes, often using two-wheel dollies, and created witness binders or red rope folders filled with good old paper. “Just print it out and put it on my desk” is a quote from a viral video on the Internet about the litigation support person meeting with an attorney.
We then quickly moved into the world of electronic data – or as it’s known today, ESI (electronically stored information) – and the banks of photocopiers couldn’t keep up. We needed a new medium and turned to TIFF (tagged image file format) as a solution. Developed by Microsoft and Aldus in 1986, TIFF was created primarily for input and output devices, such as printers, monitors and scanners. As a result, it is specifically designed to be compatible with different image-processing devices. So it was a file format designed for the print industry, but we adopted it for the e-discovery industry.
TIFF seemed to work fine back in the day, and it is even the standard for fax machines, which started out connected by analog phone lines. It was a format easily transmitted across slow data links. But who owns a fax machine today? And why are we still using technology that is 30 years old?
As the e-discovery industry matured, it looked for a new format and soon adopted PDF files. But, again, PDF was designed for the print industry, not the e-discovery industry. It works well, stores a preserved image rendition of the printed page and can contain text for searching. TIFF, on the other hand, doesn’t contain text, therefore we all came up with this brilliant contraption called TIFF/TEXT and have adopted the Concordance load file format (OPT and DAT) as one de facto industry standard for delivering productions.
Let’s recap. Both TIFF and PDF essentially capture that old ink on bark, paper-page image and preserves it to a file. That’s a good thing, some will argue, because both formats preserve an image of the original file. Essentially, we’re still doing blowbacks but now to files instead of paper. Thanks for saving the trees, folks, but here’s the rub: It’s an expensive process – very expensive.
Talk to any litigation support processing specialist and they will tell you that the most difficult, time-consuming and costly process they do is convert a perfectly good native file into a two-dimensional page rendition called a TIFF or PDF image. Add to that the geek mystical powers it takes to properly format a decent load file and you begin to understand why things are getting expensive – and why they’re taking way more time than they should. And don’t even get me started on what someone is supposed to do with a bunch of single-page TIFF files, text files and a load file if they don’t have a document review tool.
Documents today aren’t the same as they were even a few years ago. Today, a Word document can contain a hyperlink to an external source, an embedded graphic, a dynamic field that changes each time the document is opened or printed, etc. Excel files are rows and columns, calculations, and rarely if ever fit comfortably on an 8 ½-x-11 page. I call these three-dimensional documents. And today, everything is a three-dimensional document – Word, Excel and database files, even email messages. How do you take three-dimensional things and turn them into two-dimensional things?
The print industry has TIFF and PDF. The engineering industry has AutoCAD (DWG). The music industry has MP3 and WAV. The video industry has MPEG and AVI. Even the radiology industry has a file format (DICOM). Why doesn’t e-discovery have a file format designed especially for its needs?
We’re beginning to see an emerging trend in discovery stipulations to “produce Excels as native,” born largely out of this complexity to turn them into two-dimensional renderings. But there is still reluctance to do so. Why? Because of security concerns. Why not produce all documents natively? Because of security concerns, and the fact that metadata fields (informational fields about the files, such as author, subject, from, to, etc.) are needed as well. Those metadata fields get stuffed into that DAT file I mentioned earlier.
Let’s look at a couple of charts. (All articles need charts, right?) The first shows the exponential growth of data, reflecting one reason for the explosion of e-discovery costs. The other shows the history of file formats for production. (To put the first chart into context, 40 zettabytes (1 zettabyte = 1,000 exabytes) is equal to 57 times the number of all the grains of sand on all the beaches on earth.) Is there any doubt the cost of e-discovery will get increasingly more expensive?
By comparison, let’s look at the technology curve for the adoption of new production formats. In 30 years, little has been done to address the issue of production formats. If we assume that the largest cost of e-discovery is in rendering native files into page image equivalents, then there is an obvious choice: Create a new production standard, one specifically designed for e-discovery that addresses the concerns of cost, security and packaging that older approaches have not solved.
The major obstacles to not adopting native file productions are: security, inclusion of metadata, family relationships (e.g., an email and its attachments) and redactions. To overcome these obstacles, I designed a new architecture that addresses them. It is called Encapsulated Native File, or ENF.
An ENF file contains a fully encrypted native file, along with the metadata and all the members of a family relationship. The ENF file architecture has been designed, and preliminary tools for creating ENF files (makeENF) and viewing them (viewENF) have been created. ENF is a file format specifically for the e-discovery industry that is designed to eliminate all barriers to native file productions and significantly reduce e-discovery costs by eliminating the labor of translating native files to two-dimensional renderings. Think of the savings in turnaround time, from collection to review to production, if we take native files and simply repackage them – adding security levels and passwords, and designing a self-contained file with all properties included.
But adoption of this new standard by multiple vendors and our e-discovery industry is needed before we can realize its benefits.
Let’s look at the lifecycle of e-discovery. It starts with collecting native files, followed by loading them into a processing tool to extract the metadata, text, and relationship between parent and children (emails and attachments). Then the process transforms native files into “print images,” a rendering process that is often error prone and labor intensive. Any anomaly in files must be handled by a hands-on technical professional – margins adjusted, columns narrowed, orientation corrected, all in an effort to transform three-dimensions into two. Once that lengthy process is complete, the results are loaded into a document review platform.
Some people will insist that several review tools today support reviewing native files. And that is a step in the right direction, since it expedites the start of the reviewing process. To achieve that, some type of viewing technology is needed (e.g., QuickView Plus or Oracle’s Outside-In). Once documents are selected for production, they are typically produced in an image format with corresponding load files, which are really sewing machines that stitch back together the individual image files. But what if we simplified the process? And lowered the cost? And made the process faster and more efficient? (See accompanying diagrams.)
ENF uses the same viewer technology as the document review tools. It’s carried forward and used to view ENF files, just as in a review tool. The viewENF application supports encryption and password protection to limit features that can be employed in the embedded native file, and it encapsulates the extracted text, metadata and complete family tree. If you will, viewENF becomes the new Adobe Reader for the e-discovery industry.
Could a new architecture, a new file format, designed specifically for the e-discovery industry save costs? Yes. It’s time to leave 30-year-old technology behind and let Generation Y invent a new paradigm.
Published December 2, 2015.