Digital Investigations - Where You Forgot To Look: Why Databases Often Are Overlooked When It Comes Time to Harvest Electronic Data

With more than 90% of all documents produced since 1999 created and stored electronically, electronic discovery is becoming mainstream in civil discovery. As a result, familiarity with email data sources, data that can be extracted from a user's hard drive, centralized file stores (that may contain Microsoft Word, Excel or similar documents),backup tapes (whether stored on or off-site), Blackberry and other similar digital storage devices are becoming common place. What is often overlooked, when preparing to harvest electronic data, are centralized databases that may exist within an organization. The information gleaned from the compiled and processed data is typically considered the "life blood" of an organization and often forms the bases for which decision makers can make informed decisions.

Databases come in all sorts of sizes and complexities. Most are easily disguised behind a "fancy user interface," which can be programmed to enter, update, and even search data. Typical centralized database stores within an organization include Customer Relationship Management (CRM) systems and Enterprise Resource Planning (ERP) products, not to mention similar databases used to track Human Resources (HR) and other functions within an entity structure.

A CRM database is typically the place where an organization may choose to store data related to the activities of its sales force, as well as track the success of marketing initiatives and customer service activities. ERP systems include modules that manage an organization's supply chain, inventory levels, financials, manufacturing operations, and, to some extent, certain human resource information. Obviously, CRM and ERP systems have the ability to track and process large volumes of data from nearly every part of an organization. When used as designed, almost all personnel within an organization may have a role requiring that they either insert or retrieve specific data from the company's database driven system.

There are several schools of thought about how to extract, review and produce relevant data contained within CRM and ERP databases. On one hand, a corporation served with a discovery request may be required to produce a full copy of its database system so that the opposing party can have access to all of the underlying data. This access may enable the opposing party to hire a staff of professionals, query the data set and run reports from it, potentially exposing trends that might be relevant to the matter at hand. On the other hand, other legal professionals familiar with electronic discovery believe that the database itself should not be produced as a result of a discovery request, but only the reports that are regularly run from the system.

It may seem that the first option appears to be the easiest and more reasonable approach. However, there are a number of issues that have to be carefully weighed before agreeing to proceed along these lines. Most large databases like CRM and ERP systems contain thousands, if not millions of records. These records may be relational in nature, meaning that the data stored within them is broken out into and spread across large numbers of individual tables, which may be linked together and optimized for the types of queries that may be regularly run against the system. In other words, information as simple as the name, phone number and address for a given client could be broken out into at least a dozen different fields and spread across a large number of tables. The data, while it is contained in this type of format, is relatively useless until the reporting module associated with the system is programmed to piece it all together. For example, it may be regional trends in sales or the inventory and location of base parts for a given product. In short, the data within a given database is not necessarily of any value to an organization until it has been organized into reports demonstrating certain trends and consistencies which would enable business leaders to make informed decisions which will guide the company along the course of a given activity.

The most important item to understand when faced with data extraction from relational databases is that they are designed to be efficient, highly normalized data warehouses that can hold exceptionally large quantities of information. To extract data from a relational structure such as a CRM or ERP database, requires specific expertise and a solid understanding of the underlying bases of how these databases work.

Some other important factors to consider when requesting a copy of a given database system are the type of system on which it runs and the type of professionals assigned to manage the day-to-day activities associated with the system. All too often an attorney may request a backup of a given database, only to find that once it is delivered, there are only a limited number of systems available that may be able to correctly configure, restore and run the database.

The cost to re-create some of the operating environments, especially with the staggering number of "legacy mainframe run systems" that may have been used can be staggering. An additional consideration is that there is usually a team of highly trained Database Administration professionals within the organization whose primary task it is to manage the activities of the database, ensuring that it is constantly running efficiently and available for the organization to use. These professionals may often be costly to hire. The activities that they may be required to perform in the process of managing the system also are highly technical in nature. Some systems, especially those that are older, may have been grouped together as a result of certain corporate mergers and acquisitions and may not be operating efficiently or may not be stable (depending upon the systems on which they were installed).

In the multitude of cases in which we have been involved where the production of databases has been a central issue, it has been extremely unlikely to find a single situation where a complete copy of a given system can be backed up from its native environment and easily restored to a secondary system, allowing it to function in its original configuration.

On the other hand, we have had certain successes when utilizing the "report extraction" approach. In this type of approach, the client would embark upon a process where there is a methodical attempt to determine what centralized databases exist within an organization, the function of each and how employees use the data stored within them. In addition, as with a typical email store, we make sure that we have the ability to know the scheduling for backup and purge within each database system. Databases work differently than other electronic systems in that they tend to amass huge volumes of data, which may eventually slow down a system over time. As a result, some, if not most databases, tend to have a regularly scheduled purge, where old data that has not been used for a while is exported out of the primary tables and either deleted or "dumped" into secondary tables or systems for long-term storage.

Also to be factored into the equation is that once data is extracted from the primary database into a report worthy format, there may be additional manipulations that have to be conducted on it before the final product is presented to the requestor. A review of the raw data from the system would not be able to show these additional manipulations, even with a complete copy of it available for queries.

Once counsel or other client representatives understand the function of the database, the data it contains, and the length of time for which the data exists, the next step is to figure out how the data contained within the system is processed into reports and most importantly, how those reports are used to make informed decisions. It is typically those reports that are of most value within a given discovery dispute.

While both of these approaches have their merits, each also has it own set of shortcomings. The critical issues to weigh when choosing what to ask for in a discovery request or deciding what to produce or resist producing as a result of a request, is which will work better in the overall framework of your investigation and how it will help to bolster or refute the claims made within the case.

Electronic discovery issues relating to databases are not new, but may often be overshadowed by voluminous data sources including emails and user files. With respect to all electronic discovery projects, a well planned, methodical approach is necessary to ensure an effective outcome. In this regard, we suggest the following steps as the framework from which to build your ediscovery plan:


Identify the data landscape associated with your client's electronic infrastructure. Determine what electronic data they have, where it is stored, and how far back the records exist.


Establish a collection and processing plan, and follow that plan. Failure to establish an effective plan and to strictly follow it may expose you and your client to spoliation arguments or court sanctions for failure to harvest and produce all of the data deemed "responsive" to the demands of opposing counsel.


With the data landscape in mind, determine what data should be collected as part of the harvesting process.


Employ an outside vendor to advise and assist your client's IT department with expedient, forensically sound data collection before any collection commences.


Ensure that a strict chain of custody is maintained for each piece of electronic data harvested.


Obtain the assistance of an outside vendor as needed to restore the electronic data process and cull the data set into a reasonable set in preparation for review.

Armed with these critical items, you will be in a much better position to argue the merits of the case, rather than engage in disputes over electronic discovery issues.

Published .