The Demise Of Processing As We Know It

Monday, December 6, 2010 - 00:00

Editor: The last time we spoke, you discussed how companies were incorporating eDiscovery into their Governance, Risk and Compliance (GRC) strategies. Do you have any bold predictions for eDiscovery trends building on those comments?

d'Alencon: Looking ahead, I would say one of the bigger trends heading into 2011 is that processing is dying a quick death, at least as we know it today.

Editor: That sounds somewhat dire. What do you mean?

d'Alencon: In hindsight, processing of paper documents and electronically stored information (ESI) was likely the primary cost driver in the eDiscovery process. As little as three to five years ago, it was commonplace to hear of clients being charged thousands of dollars per gigabyte to "process" these documents and files. There were many reasons for this, including the fact that most corporate legal departments outsourced this work on a case-by-case basis, paper was far more prevalent in the average case than ESI, the state of processing technology was immature and somewhat arcane, and the state of technology to identify and collect ESI was also just beginning to hit its stride. All of this meant that relatively expensive human labor needed to be used and that processing those documents and files took a long time.

As a major cost driver for discovery/eDiscovery, processing is being squeezed towards zero, which has some implications for the service provider and vendor community. This is due to some macro trends, including corporations bringing more of the eDiscovery process in-house, the declining importance of paper and the continued advancement of technology.

Editor: What about these macro trends?

d'Alencon: This has been an interesting year, especially as we come off the longest and deepest recession since the Great Depression. For those of us in electronic discovery, the cost-conscious mindset resulting from the dour economic reality, coupled with an increasing number of court rulings that claims of "burden and expense" and "not reasonably accessible" were exaggerated, continued to drive many corporations to take control of their eDiscovery processes.

Until that point, companies had really left most of the work to their law firms and vendors, so this increasing corporate involvement was something of a sea change. I think it is fair to say that historically the legal industry as a whole has not been quick to embrace change. In this industry, technology to drive efficiency and cost control took a back seat to risk mitigation, even when the ESI (and now social media) became the primary driver of the average case rather than paper.

But that is not how most companies think. Inside a corporation, people are making daily dollars and cents decisions based upon operating budgets. How much is this case likely to cost me? Based on that estimate, is it even worth litigating? And at a higher level, how much will all of my cases cost? Is there a way to reduce the spend and risk by using new technology or process improvements? How do I bring eDiscovery in line with my other business processes?

My experience would suggest that corporations have increasingly awakened to the need to take control of their eDiscovery processes and have investigated technology to drive cost and risk containment for litigation support, regulatory compliance efforts and more. Initially this effort started with software on the "left side" or IT-driven side of the Electronic Discovery Reference Model (EDRM), which includes Information Management, Identification, Collection and Preservation.

Editor: So how does that impact processing?

d'Alencon : From my perspective, the act of processing documents and files adds no additional value to the data. It is a means to move this data from one product to another and simply adds cost. Historically, data arrived at the processing stage in many different formats and needed to be restored and normalized. Put simply, processing includes indexing the metadata and content so that the files can be searched; culling the data defensibly to reduce the collection to just what is relevant for downstream review and production; standardizing the data so it can be shared or produced; and generating exception reports for files that could not be processed.

Solution providers charge processing fees per GB to hash and de-duplicate files, extract metadata and convert to standard, read-only image formats, usually TIFF and more recently PDF. Processing was the centerpiece for many early providers' value statements and made up a large percentage of a company's eDiscovery spend.

Software applications (email archives, Sharepoint, etc.) that companies have more recently been implementing already have capabilities that address some or all of the three objectives of processing (index, cull and standardize).

Let's take indexing, which is one key objective of processing. Most of today's email archives and data management programs already have indexing, and therefore search, as an integral, built-in capability. These systems index discoverable electronically stored information (ESI) within the corporate infrastructure, populating defined fields with specific metadata - which is to say, these applications partially "process" the data already. So the data that is in the corporation's newly acquired email archive and Sharepoint servers is already processed - you just need to get the ESI and metadata out.

Also some of these applications, such as Symantec Enterprise Vault Discovery Accelerator, have added de-duplication and other culling measures to reduce large datasets to just the relevant documents.

Plus the software continues to get smarter. New features such as classification and preservation in-the-wild have been reaching further into the corporate ecosystem. Such new capabilities provide for better and smarter identification of targeted data subsets, so that fewer documents need to be exported. Previously, collections would grab huge numbers of documents and files that would end up in processing, but this is no longer the case.

So the software on the left side of the EDRM has co-opted two of the three main objectives of processing. That lowers the need for processing software. And the products on the right of the EDRM are also getting smarter. Most review and production software no longer requires ESI be standardized, which reduces the need for converting source files to a common image format such as TIFF. These review systems support native review and production. Today, mature eDiscovery software, like CaseCentral, includes image conversion on demand, so converting even large numbers of files to an image format is no longer a fee-based activity; it has become a function of the core software, rather than a separate utility. The image-conversion cost has essentially been rolled into the overall cost of the software in this case.

Editor: So you are saying that a number of factors are contributing to this decline of processing?

d'Alencon: Yes. As companies continue to bring eDiscovery into the corporate fold, integration of functionality will continue in the corporate IT infrastructure. What this means for eDiscovery is smarter applications and more connectivity between left-side data sources run by IT and right-side software applications used by business users, so there will no longer be a need for standalone processing. We can see this already through software integration as evidenced by CaseCentral's integrated Connector to Symantec Enterprise Vault Discovery Accelerator (EVDA), through industry standards, such as the EDRM XML, and through industry acquisitions.

Editor: You spoke of the new CaseCentral Connector in our last interview. Why is this important?

d'Alencon: The CaseCentral Connector is a software product that programmatically integrates the transfer of documents and files stored in Symantec Enterprise Vault, which can be put into case files using Symantec Discovery Accelerator and then analyzed, reviewed and produced using CaseCentral. The important point is that at no time do human hands need to process or re-process the data from collection through to production and, since the data transfer is highly automated, the number of data transformations typical in normal processing is eliminated. Net, net this is an example of how the industry is integrating products and providing an end-to-end capability in order to remove unnecessaty time, risk and cost.

Editor: What is the EDRM XML standard?

d'Alencon: The Electronic Discovery Reference Model (EDRM) group has been working on adoption of a proposed metadata standard, the EDRM XML, for some time. First launched in 2008, the EDRM xml claims to "help practitioners significantly streamline processes and enable the integration of multiple e-discovery technologies." Basically use of the EDRM XML helps to provide relief from proprietary load formats and reduce the time to transfer ESI from one supported system to another, in addition to other benefits.

Editor: Do you think processing will eventually cease altogether?

d'Alencon: Companies that are taking control of eDiscovery will continue to drive the business case for software integration as the result of an increasing need for efficiency (time, cost, risk). And there is increasing evidence of software providers reacting to their clients' needs for this integration.

As left-side software is getting smarter, collections are getting smaller and the need for processing is decreasing. Data volumes and associated costs are dropping. Software integrations will further drive these costs down. Interestingly, when Barry Murphy was an industry analyst at Forrester Research (he now is an independent consultant at the eDiscovery Journal), he predicted this leveling off of the cost of processing back in 2006: "the price per GB to process data - currently about $1,800 per GB - will be forced down to approximately $500 per GB by 2011 as more competitors enter the market and as enterprises conduct more due diligence about the solutions they choose."1

Ironically Barry's 2011 cost estimate was a little high. His report goes on to say that application advancements would allow for better culling within the enterprise. These predictions were right on the mark. Based upon these trends, vendors who rely primarily on processing revenue will likely be squeezed out until there are just a few.

Of course, processing will not disappear completely. Even with the assimilation of processing capabilities into left-side tools and the integration with review and production applications, for the foreseeable future there will be file types that fall outside of the defined process, paper that needs to be scanned and coded, and other exceptions that will require human intervention. But for the majority of cases, the necessary file types and content will be processed quickly and transparently with a few mouse clicks.

I think the inescapable conclusion is that processing will increasingly become an embedded part of the fabric of the software tools that we use and, more importantly, become a significantly smaller, predictable software cost, rather than an expensive, repetitive process.

Please email the interviewee at with questions about this interview.