Data Mapping: From Recommendation to Reality

Monday, October 29, 2018 - 15:54

 

Much ink has been spilled on the requirements, burdens and benefits of the GDPR. Many commentators have offered helpful guidance for GDPR compliance. One thing everyone agrees on is a general recommendation to start with data mapping. The recommendation raises the question, what is a data map and how do you get one?

Data mapping is the process of creating a comprehensive and accurate inventory of a company’s information assets. For specifics on making a data map we can look to information governance and litigation, where data mapping is a well-established strategic objective.

Compliance, information governance and e-discovery are distinctly different in most respects. Each has a specific objective, scope and methodology. However, they also have an important aspect in common: knowing the company’s data is critical to success. In short, data mapping is the common starting point.

Prioritize, prioritize, prioritize.

Data is one of a company’s most important assets. It must be used strategically and efficiently for maximum value. It must be kept secure against theft, unintended disclosure and cyberattacks. A typical priority for information governance is mapping customer information that provides a competitive advantage.

Yet data can equally be a significant liability. Poor data management and unneeded data are major cost drivers in litigation. Defensible deletion of unneeded data – widely estimated at a staggering 70 percent of all corporate data – is an evergreen priority for litigation preparedness.

Moreover, regulations impose burdensome ongoing legal requirements. Data must be appropriately managed and tracked. It must also be protected from data breaches. Disparate regulations give rise to a wide variety of specific compliance objectives. For instance, mapping international data transfers originating in the EU is a priority for GDPR compliance.

The first step in responding to these challenges is learning about data sources, location and volume, how the data is being used and who has access to it. The more you know about your data, the better your data-based decisions are.

Thorough planning is essential to data mapping success.

Planning is arguably the most important stage of the entire data mapping project. The plan must be well thought out, achievable and responsive to business needs. Lastly, it should align with project priorities.

Like any complex project, the data map plan needs:

  • Clearly defined objectives
  • Management buy-in
  • Specific deliverables
  • Assigned roles and responsibilities
  • An implementation timetable
  • A realistic budget

In addition, it should cover points particular to data mapping:

  • Categories to be mapped
  • Level of detail
  • Classification schemes
  • Standardized terminology

Finally, plan for ongoing updates and revisions. Infrastructure, software solutions and employee data practices all change over time. Don’t neglect future updates to the map when assigning roles and responsibilities. Any map will quickly become outdated if it isn’t updated regularly.

Data mapping builds a comprehensive and accurate inventory.

Most data maps include the following categories:

  1. Source: The foundation of the data map is identification of each unique data source. A data source may be an asset group or a standalone repository. The most common corporate sources are file servers, document management systems, email systems, messaging systems, workstations, databases, mobile devices, cloud repositories, archives, backups and hard copy archives.
  2. Location: Location should be tracked in a way that has meaning for the data source. Hardware data sources like servers and workstations have a physical location. Other sources are part of an IT system, such as a database hosted on a server. The third-party provider is the location for hosted data.
  3. Data Types: Data types generally follow from the source. For example, MS Office files and PDFs are the standard data types found on a desktop computer. However, some research may be needed to determine data types for proprietary software and multi-purpose sources.
  4. Volume: Standard IT software can be used to auto-inventory data volume on company systems. You may have to rely on reasonable estimates for data in the custody of third parties and employees.
  5. Status: Typical status classifications are active, off-line, archived, legacy and backup.
  6. Access: There are two related aspects to access. First is the practical question of how employees actually work with the data. Second is whether there is any anticipated delay, cost or difficulty in accessing the data. The latter is closely tied to status – whether the data is active, off-line, etc.
  7. Business Owner: Each source should be “owned” by a business unit or individual employee. While a source may have multiple owners, it’s helpful for information governance to designate a primary data custodian. Control, responsibility for updating, frequency of use and access rights are pertinent factors. Orphaned data that no one claims ownership of is a prime candidate for defensible deletion.
  8. Business Value: Assessing the data’s value to the company is essential to information security and risk management. The business owner is responsible for making value assessments based on, among other factors, content, how the data is used, frequency of use, age, general business needs and the existence of duplicate or overlapping sources.
  9. Confidentiality Classification: Corporate information security policies should include a classification system for confidentiality levels; for example, public, confidential, trade secret. More granular classification may be used for highly sensitive data types like financial information and patient records.
  10. Non-company Owned Data: Identify sources containing non-company owned data; for example, consumer, patient, supplier and employee data. This is extremely important in compliance and information security.
  11. Security: Mapping should include technical information on security measures for each data source. This supports periodic security audits and evaluations, increasingly essential in the face of growing cyber threats. Is the existing level of security sufficient in light of value, confidentiality classification and ownership?
  12. Backups: Finally, cross-reference each source with the applicable backup system, schedule and retention period. It’s common to have different backup protocols for critical and non-critical sources.

Additional categories are mapped for compliance. This begins, naturally, with identifying the applicable U.S. and/or foreign regulations.

Particular mapping requirements are determined by the regulation at hand. For example, the GDPR requires companies to track an enormous amount of information about the personal data of EU residents. This includes, to list just a few categories, data subject’s consent, existence of sensitive information, date and purpose of data collection and transfers from data controllers to data processors.

Data mapping starts broad and gradually narrows focus.

The focus in data mapping begins broad and gradually narrows. A typical consulting plan is to meet first with IT to gain an understanding of systems, assets, retention policy and practice, employee separation procedures, archives, backup system and outsourcing of data storage or management. IT is the primary authority on sources such as corporate email and backups.

The second step is to consult with business unit leaders about needs and general data practices. They will flag the seemingly inevitable repositories and software programs IT doesn’t know about. Next, meet with records managers about specific document management systems, databases and file rooms.

At this point in the process enough information has been gathered to build the data map in outline form. Request information about format, volume, security information, etc. from IT. Continue to narrow the focus by gradually filling in gaps and resolving inconsistencies.

The usual format for a data map is a spreadsheet. Supporting documentation can include flow charts, glossary of technical terms, source summaries, interview notes, data volume reports and physical asset inventories.

Enterprise data mapping software solutions automate some parts of the process and can be used to generate the map instead of manually creating a spreadsheet. Large corporations with complex IT systems and companies in highly regulated industries should evaluate investing in data mapping software.

Legal, compliance, IT, management, finance and HR are all key stakeholders in data mapping. Ask for their review and feedback along the way. Successful projects are both collaborative and iterative.

The GDPR has recently brought data mapping front and center. Whether your goal is compliance, information governance or e-discovery, the starting point is knowing your data: what you have, where it is, who uses it and why. Start by picking a priority project with a defined scope. Make your first data mapping success the springboard to comprehensive data management.


Helen Geib is general counsel and practice support consultant for QDiscovery. Prior to joining QDiscovery, Helen practiced law in the intellectual property litigation department of Barnes & Thornburg’s Indianapolis office where her responsibilities included managing large scale discovery and motion practice. She brings that experience and perspective to her work as an eDiscovery consultant. She also provides trial consulting services in civil and criminal cases. Helen has published articles on topics in e-discovery and trial technology. She is a member of the bar of the State of Indiana and the U.S. District Court for the Southern District of Indiana and a registered patent attorney.