Data breaches—and the associated exfiltration of data– are a harsh reality of our digital world. Business savvy organizations take all available steps to protect sensitive and confidential information from exfiltration and misuse, but hackers are sophisticated and don’t give up easily.
Bad actors are looking for any number of items when they infiltrate an organizations’ databases or emails stores. Often the modus operandi is more of a smash and grab than a carefully calculated exfiltration of specific “high value” data. In a blink of an eye the hackers gain access to a broad swath of an organization’s data, grab everything they can, and disappear into cyberspace.
Most of us are familiar with the who, what, when, where, why and how of a journalistic inquiry. Data breaches and cyber incidents require a different analysis—one which starts with the how, where and when before the what and who are established. The task of determining exactly what personal or confidential information is contained in the body of data taken and who the information belongs can be a daunting task, particularly where large volumes of data are implicated.
This blog will explore the teams and technologies that can be used to optimize all phases of breach response – from multi-disciplinary teams who specialize in determining the how, when and where, to a specialized strike team applying an analytical approach to uncover the what and who.
Determining how, when and where
The first step for any organization faced with a data breach or cyber security incident is to assess how, where and when the hackers got in; it is a critical first step so that the organization can take proactive steps to prevent future incidents. Generally, the targeted organization will retain expert data forensic assistance right away, while also putting their insurers on notice and retaining outside counsel to help assess the scope of the breach and manage risk.
The next step is generally to have forensic experts determine the scope of the breach. Unlike a traditional journalistic inquiry, this is not yet the inquiry into what data was breached – but rather a determination of the basic scope or outline of the data breached. If the incident is a smash and grab, a vast quantity of data may be exfiltrated. In those circumstances, identifying the total corpus of data compromised is only the start of the process. The organization knows the broad outline of the data exposed.
Now it is time to find out what was breached and who is impacted (to whom the personal information belongs).
Determining what and who
Once the initial forensic team has identified the outline of the body of data that was compromised, the organization will then need a deeper understanding of what data was taken and who was identified in that data to understand the potential risk and damage associated with the breach, whether legislative reporting obligations have been triggered, and ultimately to fulfill data reporting obligations where required.
The challenge in most instances is that vast quantities of data will have been compromised. While the goals and workflows associated with data breach analysis and reporting are distinct from a litigation eDiscovery review for production scenario some of the same advanced tools and analytics created for eDiscovery can be used to maximize efficiency. Efficiency is critical because in data breach analysis, even more than in eDiscovery, time is of the essence and budgets are tight. The organization is often obliged by statute to notify individuals whose personal data may have been compromised within a specified period.
Organizations are also anxious to contain risk and reputational damage by identifying the data breached as quickly as possible. Conducting a linear review of every document within a compromised data set would be cost prohibitive and slow. Throwing bodies at the entire corpus of data yields uneven results and escalates costs unnecessarily.
Leveraging data analytics and data interrogation specialists
Where stakes are high, time is short, and budgets are low the most effective and efficient way to determine the “what and who” of post-breach analysis and reporting is to engage an investigation strike team that can apply data interrogation techniques and leverage every high-tech tool in the toolbox to separate the garbage data from the nuggets of personal and business confidential information that the hackers may have found. The best way to uncover what and who is impacted is to leverage a specialized team of data interrogation experts who are able to leverage analytics to rapidly separate data in two broad categories – those unlikely to contain personal, sensitive or confidential data and those that require more careful scrutiny. The OpenText team refers to this methodology as rapid analytic investigative review or RAIR.
Pull quote: “We know that potentially a very large swath of data has been exfiltrated but when we go in to analyze what was taken we are not just going to throw bodies at the project to review the entire data corpus” Tracy Drynan, Principal Consultant, OpenText Recon Investigations service
At the initial phase RAIR involves a highly skilled data interrogation expert spending some time to become immersed in the data. The RAIR expert is trained to rapidly gain an understanding of the underlying structure and patterns within the dataset to efficiently slice through large swaths of the data, separating data that likely contains personal information from that which can be ignored. Mundane business communications, for example, are unlikely to be of concern.
The RAIR expert uses all available analytic tools and knowledge of the organization, including communication analysis tools, domain analysis and knowledge of the organization’s structure and communication patterns, to isolate a much smaller data set for review by the analysis and reporting team.
In the secondary phase a small data analysis and reporting team conducts a targeted review and extracts the critical details– the what and the who– for the purposes of creating the data breach report. Here too, the team leverages all the textual and pattern recognition analytics available to identify and extract personal information with maximum speed and accuracy.
The key is smart people using smart technology to solve puzzles and spot patterns, but there is no easy button. Both the technology and the case law in this complex area are in their infancy.
What about leveraging technology-assisted review?
Unlike litigation eDiscovery matters (outbound or inbound review for production), even advanced forms of technology-assisted review (TAR) that leverage continuous machine or continuous active learning protocols are of limited use in data breach analysis and reporting. Here’s why.
What is ‘reasonable’ for breach analysis?
The acceptance of TAR in the legal profession is premised on what is now a widely accepted standard of reasonableness for document review for production in litigation or regulatory investigation matters. However, in data breach analysis and reporting there is not yet any guidance (from courts or think-tanks such as the Sedona Conference) on what constitutes a reasonable standard.
While being 98% correct is more than reasonable and diligent in an eDiscovery review for production paradigm, in the context of breach response is it acceptable if 2% of all the personally identifiable information—health information, credit card numbers, social insurance numbers—go unreported?
The path forward
Until such time as the technology and the law around data breach analysis and reporting becomes more settled, there are more questions than there are answers. What a RAIR approach does in this evolving and complex landscape is provide a defensible and practical way to answer what and who questions of data breach analysis and reporting.
To learn more about OpenText eDiscovery & Investigations Services, including breach response analysis and reporting click here.
Published December 2, 2022.