Published April 8, 2017.
Information Governance Insights: Preserving and Collecting Structured Electronic Data Is Tricky
Today’s litigators are keenly aware of the need to effectively preserve electronically stored information (ESI) when new legal matters arise. And dealing with various types of unstructured data such as user-created documents and email messages has become quite routine. But preserving and collecting structured ESI from enterprise-wide database systems present unique challenges when it comes to e-discovery, and they get far less attention.
A typical medium-sized enterprise has hundreds or even thousands of structured business applications that store financial, product, customer, employee and other material information. These systems are often the lifeblood of the company and the system of record for all business-related activities. Given their importance, e-discovery compliance programs must also incorporate processes for the preservation and collection of ESI from these systems – not just focus on email and file servers.
While documents and email are generally self-contained and static, the information in structured data systems is dynamic and constantly changing. Data in one system or table is also often highly dependent on data in others. The databases themselves also often have longer lifecycles and are continually evolving. This makes discovery on structured data magnitudes more difficult. Without a robust e-discovery program that directly addresses these issues, IT organizations are forced to overpreserve and retain large amounts of information, which can adversely impact system performance and budgets.
The primary challenges to be addressed when building a successful e-discovery and compliance program for structured data fall into four primary areas:
The proper identification of the specific information or data objects within the system that are subject to retention obligations is central to ensuring that the company is neither overpreserving nor underpreserving information. In addition to the records they contain, databases comprise a variety of other elements, including reports, printouts, queries, application layer source code, pick lists and more. The relevance of these elements to an e-discovery request depends on the system, the business, the industry and the type of legal action. Yet, instructions from the legal departments often simply say “preserve all payroll records.” This offers little actionable guidance to system administrators. Counsel must make an effort to identify the specific elements needed so that they can provide actionable and auditable instructions.
2. Preservation and collection
Once identified, the specified data must be preserved and collected. High-volume transactional information often has very short retention periods. Other data is often overwritten and updated. This means that counsel should make decisions about what to preserve quickly. Another challenge is deciding how to preserve this data. It must be captured in a way that preserves the integrity of the relationships within the system – or else it may be rendered meaningless. Because these systems evolve organically over time, they often lack descriptive documentation, and it can be difficult to find employees with sufficient knowledge of all their intricacies. Therefore, an iterative investigation is often necessary along with a bit of trial and error during the extraction process.
Another common collection challenge arises when defining queries. Requests that seem simple and clearly defined to the legal team can be unclear to a database administrator. For example, you may request a report containing “all employees in the state of New York.” But how do you define a New York employee? Is it by the address where their paycheck is sent or the zip code of the facility they are assigned to? What about an employee who is assigned to a facility in New York but lives in New Jersey and is always on the road covering sales territory in Pennsylvania?
3. Validation and authentication
Once the required information has been collected, it still must be validated and authenticated. Validation is the technical process of determining whether the extraction was accurate and complete. For example, did the queries used actually return the right employees, and were any records inadvertently omitted? This process typically relies on a combination of techniques, which may include comparisons of record counts, fielded values, and summarized or consolidated reports from both the source system and the exports.
Authentication is a legal construct concerned with tracing the chain of custody from the physical copy of the information you are producing to opposing parties all the way back to the system of record. This important process ensures that the information is what it purports to be, hasn’t been changed or tampered with along the way, and is still accurate and complete. Frequently, the internal IT team executes the data extracts with help from external resources. Service providers may perform some filtering and sorting of this data, and outside counsel may mask or redact certain elements, such as social security numbers. When the exports are finally disclosed, it is important to have a fully documented chain of custody describing who pulled the data, who else touched the data along the way, and exactly what was done to it at each step. In addition to the signed chain of custody forms, documentation typically includes retaining a copy of the original queries or specifications used for extractions, recording record counts and calculating MD5 hash values.
4. Custody and control
Believe it or not, e-discovery does have some limits. One of these is that parties are only obligated to disclose information within their custody and control. The issue of custody and control is generally a relatively easy one to address when dealing with unstructured data such as email and user files. It can be a much trickier issue to address with structured databases. This is because all data elements within large enterprise databases are not necessarily owned or controlled by the same entity, and portions of data processing or aggregation may be outsourced to third parties. When building an e-discovery program, it is very important to determine exactly who owns and controls what elements at any given point in time and what portions of the systems may therefore be subject to each entity’s preservation and production obligations.
This issue is further complicated by the use of cloud services. It is common for an organization to own and control the transactional data in the system but have no control over the source code, architecture and algorithms, nor even over certain aggregated or processed outputs of the database. For example, SalesForce.com considers its database structure to be a proprietary trade secret, and disclosing the tables, fields and relationships within the database during e-discovery would likely be a breach of its end-user license agreement. The same holds true for many solutions offered by Google Apps.
Corporate governance and structure can also complicate custody and control issues. For example, a parent or subsidiary company may own or control the database and database software yet lease its use to junior entities or to other subsidiaries. In these cases, it may not be clear who controls what at any given time, creating complex legal questions to work through before responding to discovery from the system.
Given these complexities, and others such as data privacy issues, it is no wonder that counsel often gets tripped up when attempting to meet their discovery obligations for structured data systems. To this end, a little advance planning and investigation before litigation strikes can go a long way toward greasing the wheels for when a timely disclosure becomes necessary. It can also help to enlist the assistance of a trusted advisor who has both the technical and legal acumen and experience regarding e-discovery and structured systems to help wade through these complex issues.