When you think of bold, innovative, downright disruptive users of cutting-edge technology, you naturally think of … lawyers?
Risk-averse, cautious, measured, paid-by-the-hour, brick-and-mortar, word-parsing, quintessentially nuanced (“I didn’t hear your question, but regardless my answer is: ‘It depends.’”) lawyers?
Lest the irony be missed, the legal industry is deservedly notorious for being a technological step or two – or more – behind its clients.
Yet the change has come. We attorneys are leading the way in making practical use of today’s brave new world of machine learning. Indeed, law firms and savvy corporate legal teams have been pioneering the use of artificial intelligence since the last decade, to the point that there is not a litigator of note today who hasn’t heard of predictive coding or technology-assisted review (TAR).
Why Lawyers, Why Now?
Several factors have led to this spike in innovation from such an unexpected source.
Factor 1: Big Data
The exploding amounts and kinds of data being generated by workers – in office programs, cloud apps, chat systems, shared workspaces and more – mean an ever-increasing challenge for legal and compliance officers, as all of this work product is potential evidence for litigation and investigations.
Factor 2: Bigger Cost
How expensive is legal document review? Studies suggest that, of all the money spent on litigation across the U.S. annually (estimated to be over $200 billion), 70 percent of it is spent on discovery – and 70 percent of that discovery spend goes to document review. So anything that can accelerate or reduce review means substantial savings for corporate clients.
Factor 3: The Unpleasantness of Document Review
No one enjoys reviewing irrelevant content. (Imagine if you had to carefully read every junk email you receive before deleting it.) For lawyers conducting review, front-loading more of the relevant content makes document review a far more engaging experience, which, in turn, improves their accuracy and productivity.
Factor 4: The Need for Speed
The old maxim to “never settle” means nothing in the context of litigation. By most accounts, over 95 percent of civil cases settle, as the uncertainty and cost of a trial is to be avoided at nearly all costs. In short, settling is good depending on the terms, so finding the key evidence that establishes or disproves liability early is critical to negotiating a favorable outcome. After all, the least costly document review is the one you don’t have to do.
Think Netflix or Pandora on Steroids
Predictive coding is essentially about finding “more like this,” where “this” is a piece of unstructured data (an email, slide deck, letter, memo, etc.) and “more like” are documents that are conceptually similar, whether or not they contain the same words that made “this” relevant in the first place.
This last point is key: Documents that are similar in concept but use substantially different language can be equally significant in litigation and investigations. Were this not the case, Boolean keyword searches would keep litigation professionals satisfied for years to come.
In fact, the uncompromising binary nature of keyword searches – the words are either in the document or not – makes for challenging use in the context of legal discovery. To their dismay, litigators routinely find themselves negotiating with opposing counsel over Boolean search strings of such length and complexity they could make a Silicon Valley developer wince.
Far Beyond Keyword Search
Predictive coding begins with a statistical analysis of the co-occurrence of all the words in each document ingested into the system, even across millions of documents. The system then creates sophisticated models from a handful of documents judged by attorneys to be relevant to the issue at hand. It then looks across the entire data set to find more documents closely related to those models and suggests them to the attorneys for priority review.
The machine learning is enhanced on an iterative basis: As attorneys review the suggested documents and label them as relevant or irrelevant, the machine gets smarter, refining the document models for even better results in the next round, and the next. Unlike the rigid TAR models many experimented with in years past, today’s continuous machine-learning systems learn flexibly and naturally along with the reviewers, accelerating the process round after round.
The end result of predictive coding is that legal teams can typically find virtually all of the relevant content after reviewing as little as 10 to 30 percent of the data. Not only does this shave weeks or months off costly legal review projects, it also identifies the critical evidence far faster – a key to guiding case strategy.
As the underlying analysis is statistical in nature, the machine doesn’t need to read or understand the words used, yet it can find related documents with remarkable accuracy (even those documents that a Boolean search would have missed altogether).
Another advantage of the statistical analysis is that the effect is virtually language agnostic; foreign language data sets can benefit equally from predictive coding, so long as the documents used to train the system are in the same language as those sought.
Looking Forward
This year is poised to be transformational for legal technology, as awareness of and experience with predictive coding are approaching critical mass.
In 2016, OpenText, a global provider of enterprise information software, acquired Recommind, a pioneer in advanced analytics and machine learning for the legal industry. OpenText’s vision is to see predictive coding applied to nearly every matter, on virtually every data set needing legal review and analysis.
After all, who better to drive technological innovation than your venerable counsel.
Published April 28, 2017.