Editor: Over the past five years, an onslaught of e-discovery systems has hit the market. But e-discovery costs remain high. Have these systems failed to deliver on promise?
Sharp: I think that the last few years have seen a dramatic transformation in e-discovery. E-discovery has become a way of life. The development of e-discovery technology has transformed e-discovery from a costly mountain-climbing expedition, undertaken only by the brave and the foolhardy, or at least one with an adventurous spirit, into what has become almost a gentle stroll in the park. Five years ago electronic discovery was still largely uncharted territory. That has all changed dramatically with the emergence of advanced e-discovery systems. The process has become standardized and automated. The focus has been on simplifying and reducing the costs of data processing and hosting. The systems have successfully mechanized all the data-handling elements that you need to enable attorney review in an e-discovery process, including file conversion, database construction and indexing. In this respect we have seen outstanding results. Data processing has been routinized, and costs have dropped five, ten, twenty and in some cases, even fifty times.
Editor: Costs are one aspect of the problem. But the objective of e-discovery is the discovery of relevant documents. Companies face huge risks for failure to comply with e-discovery demands. Given the stakes, can we afford to sacrifice quality for cost?
Sharp: I would definitely agree with you that costs are one aspect of the problem. I would go further by saying that processing costs are only one aspect of total e-discovery costs. Technology today has virtually eliminated processing costs as an issue in e-discovery. That mountain has been blasted away. But now the dust is settling on what's left of the processing cost "mountain" and the industry has discovered, to its dismay, that the processing mountain was just blocking the view. What has been revealed is the existence of three far larger mountains - review costs, process quality and measurability.
Review costs can legitimately be called a mountain. The cost of review far exceeds processing costs. Technology has solved the processing problem, but it turns out that this is the smaller part of the problem.
The second mountain that has been exposed is the quality of the process. You ask whether it is acceptable to sacrifice quality in order to reduce costs? When it comes to litigation review, sacrificing quality is absolutely not an option. As a business process, the objective of e-discovery is the discovery of relevant documents. The standard legacy process is based on keyword searching. Keyword searching requires the attorney to think of words and phrases that, when fed into a search engine, will retrieve relevant documents. However, this process is broken. A series of independent studies has shown that keywords typically find only 20 to 30 percent of relevant documents. That is fine for the 20 to 30 percent that you find, but less good for the 70 to 80 percent of the relevant documents that you leave on the table and never see the light of day. Moreover, keywords are very imprecise, and usually yield a review set, the vast majority of which is junk and noise. This is a major indictment of electronic discovery as a viable business process.
The third mountain, recently exposed, is that e-discovery process is based on guess-work. Management guru Peter Drucker once said that if you can't measure it, you can't manage it. E-discovery is probably one of the last of the Mohicans, one of the last remaining business processes that you simply cannot measure. The legal profession now realizes that a new generation of capabilities is required to address these three "mountainous issues" - review costs, process quality and measurability. The industry's emerging solution to these challenges is "smart discovery."
Editor: What do you mean by "smart discovery"?
Sharp: Smart discovery is a new paradigm, and the objectives of this new paradigm are threefold: first, to reduce review costs to a reasonable, proportionate level; second, to find much more of the relevant material; and third, to eliminate guesswork by transforming e-discovery into a business process like any other, where quality is measurable and decision makers have the ability to quantify risk and cost.
Editor: How does smart discovery differ from the way that e-discovery is conducted today?
Sharp: The incumbent paradigm for e-discovery acknowledges that you cannot review all the documents in the collection. You don't have the resources, the time or the budget to review 100 percent of the documents in a given discovery population. So it is accepted that you will apply techniques to reduce the size of the collection to something more manageable. Firstly, as I mentioned earlier, the incumbent technique is keyword search. Keyword search yields massive review volumes, but only a small minority of the relevant documents. So the reviewers are missing most of the relevant material, and also they are required to review a significant amount of extraneous, irrelevant material to capture each relevant document. Secondly, the incumbent paradigm is based on high-volume review using low-cost, low-grade review resources, and is highly dependent on offshore and contract review. The third element of the incumbent paradigm is linear review. Linear review means that review is conducted document by document, with attorney eyes on every page. The fourth and final element of the standard approach is that quality assurance is ad hoc. There is no systematic process in place to ensure the quality of the process.
Editor: What is predictive coding and how does this fit into the notion of smart discovery?
Sharp: Predictive coding refers to the use of technology to extrapolate the review decisions of an expert attorney on a sample of documents to the entire collection. The results are phenomenal. Use of the technology shows that predictive coding properly applied can retrieve 70, 80, or 90 percent of the relevant documents. Obviously this compares very favorably with keyword searching. Not only that, but to retrieve 70, 80 or 90 percent of the relevant documents you are required to review only a fraction of the documents that would need to be reviewed under the traditional paradigm of keyword searches. While I am not suggesting that predictive coding be used as a replacement to human review, tests show that predictive coding is as accurate and in some cases even more accurate than human review. The reason is simple. Predictive coding amplifies the knowledge and experience of your best-qualified reviewer, it is always consistent, and it never gets tired or hungry.
Editor: Do you mean that smart discovery is simply a synonym for predictive coding?
Sharp: Smart discovery would not be possible without predictive coding, but predictive coding is really just part of the story. In order to achieve the paradigm change to which smart discovery aspires, you also need to crack open linear review by grouping documents logically. These logical groupings include grouping by topic, by near-duplicates and by email threads. This allows reviewers to review groups of logically related documents, rather than reviewing them one by one. Another key part of the smart discovery paradigm is statistics. You need to apply statistical methods to quantify and measure the e-discovery process. To summarize - there are three parts of smart discovery: predictive coding, document grouping and statistical modeling.
Editor: Where does predictive coding fall within the e-discovery business process?
Sharp: There are essentially three places within e-discovery that you will see predictive coding used. Firstly, in early case assessment you need to make a "fight or flee" decision. Will you defend this case or will you settle? The relevance assessments generated by predictive coding allow the users to zoom in on the most relevant documents. This is important because it allows the team to make a rapid yet informed decision on whether the case is winnable, and if winnable, at what cost.
Secondly, in the culling process, predictive coding allows you to achieve a double win - review less and find more. Predictive coding, applied correctly, can retrieve two, three or even four times the number of relevant documents that would be retrieved with keyword search. Moreover, this can be achieved while reviewing fewer documents than would need to be reviewed in an equivalent review using keywords.
Thirdly, in the review process there are three elements at play: prioritized review, stratified review and quality assurance. Prioritized review means starting your review with the most relevant documents and working backwards. This enables you to accelerate the development of the case, and in many cases, wind up the review once sufficient key data has been collected. Stratified review is similar to prioritized review, but uses the relevance scores to reduce cost and risk. Stratified review involves taking the high-potential documents (that is, those with the highest relevance scores) and having them assigned for review by your highest-grade review resources. Similarly, the low-potential documents are assigned to your low-cost resources. Quality assurance is facilitated by comparing the relevance assessments generated by the technology against the relevance assessments generated by the human review team. In this way you can systemize the quality assurance process by surfacing those documents where there is a high likelihood that the human review team has made a tagging error.
Editor: So who stands to lose from the movement to the smart discovery model?
Sharp: If you asked me this question five years ago, I would have said, right off the bat, law firms. Five years ago, the billable hour was king. This is no longer the case. Law firms today recognize they are in a very competitive market and they will do what they have to do in order to be competitive. In this new business environment, predictive coding is a win-win technology. It is a win for the law firms because they are able to provide better value for their customers, the corporations, and in so doing, compete more effectively for their business. For the corporation it is also a win because this technology reduces both the cost and risk of litigation.
Editor: Who is leading the charge for smart discovery?
Sharp: I think my answer will surprise some people, but it is law firms who are leading the way. Corporations are not far behind, but savvy law firms see the application and the adoption of this approach as mission critical in a very competitive legal market in which they seek to reduce costs and enhance value.
Editor: Ten years from now, how do you think people will view the emergence of smart discovery in 2011?
Sharp: At the LegalTech Conference in New York in February, someone approached me saying "I have been in litigation support for 30 years. The last 30 years have been baby steps. We have achieved a lot in process automation and supporting systems, but it has been an evolutionary process. The emergence of predictive coding is the first time that we have had a true revolution. This is a watershed in the industry - the first time in the industry that we have seen a true revolution in the way that electronic discovery is conducted, and this is really changing the rules of the game." Maybe that's how people will look back on the emergence of predictive coding, but who knows. Making predictions is one thing you don't want to do in this industry.
Published May 2, 2011.