Archiving Is For E-discovery; Backup Is For Recovery

Tuesday, September 1, 2009 - 01:00

"Not a dark or novel art" might sound like a ruling from the Wizengamot Court in the Harry Potter universe, but actually came from a ruling denying safe harbor for an evidence spoliation claim. The dark art referred to by the court is Information Management , the process and technology organizations use to acquire, retain, hold and ultimately expire Electronically Stored Information (ESI). Court rulings like this are why you care about information management, and the difference between backup and archiving.

The challenges surrounding the discovery of information from backup tapes was thrown into stark relief in the now-infamous Zubulake v. UBS Warburg. LLC case .The Committee Note to Rule 26 affirms that computer backup tapes are subject to legal hold, and may be discoverable. If backup tapes are used for information retrieval then they may be accessible for the purposes of e-discovery. The problem is, backups were never designed for e-discovery, and it shows.

In Toussie v. County of Suffolk , the county argued search of backups was overly burdensome. The court narrowed the search request to 35 terms, but it still required an estimated 470 backup tape restorations at a cost of between $400,000-$900,000. Why so much? Computer backups are snapshots of a computer system at a particular point in time, so that you can recover the system back to this point in the event of a failure or other disaster. The entire backup process is optimized around that concept of a computer system image at a singular point in time - whereas e-discovery is about finding specific information within a particular time period. It's a bit like being asked to find every photo of Aunt Petunia in a green dress during the 1970s when faced with 47 shoe-boxes of family snapshots covering those years. The only option is to open all the shoe-boxes and look at all the images. In this particular case, narrowing of the search terms didn't help the County: with no way of knowing which backup tapes matched the terms, restoring the tapes was the only way to find out.

Unfortunately, it isn't just about knowing which backup tapes or images are relevant - it can be worse than that. Most backups are of a particular system configuration, which can mean you need the same computer hardware to complete a successful restore. This presents a challenge for most organizations because they often have two- to four-year hardware refresh cycles - so the chances of finding the same hardware are slim for all but the most recent matters. At this point you're looking at eBay to find old hardware, or specialized backup restoration services.

In general, tapes kept solely for disaster recovery are not likely to be defined as accessible. However studies indicate that 70 percent of restores from backups are not for disaster recovery - they are to retrieve deleted information. Some organizations have attempted to hide behind accessibility, but as the Toussie case shows, backups are often deemed to be accessible. In that case, the fact that there was no other way to get the data was relevant, leading the court to state "You can't just throw up your hands and say we don't store [e-mails] in an accessible form and then expect everybody to walk away."

The concept of accessibility has become pivotal, and the problem is that many backups may satisfy its definition. In order to determine the accessibility issue, courts may want to know if a party routinely restores backup tapes (e.g. to test if the tapes still work), or tapes have been restored in other situations (e.g. to restore an accidentally deleted file or email). You may also be asked if your projected costs have been compared to specialized vendors, who may well turn out to be cheaper due to their experience and tools.

Modern active archiving systems were developed to address these problems. I've asked a room full of IT people if they're doing archiving and nearly every hand went up. For some, though, archiving meant a pile of backup tapes somewhere - in one case, the organization had demolished the employee swimming pool to create more space for them. Active archiving is different: it's a way of centrally managing the storage, retention and hold of information while ensuring "live" (or active) access to any item. Active archives are indexed so that information can be rapidly retrieved for business, regulatory or e-discovery purposes.

Active archiving works by moving information out of email and other systems into a central repository - the archive. While the information may be gone from the original application, active archiving software works under the covers to provide access to the archived items with minimal disruption to end users. For example, archived emails still show up in your Inbox, and double-clicking on one in Notes or Outlook opens it like any other message. Under the covers, the software has retrieved the item from the archive and fed it back to your email program, with the benefit that you don't have to change the way you access your email. You can reply, forward or otherwise use the message just as before.

Once in the archive, an item can be controlled according to an information management policy. This is important because it becomes hard to enforce information policies as the volume grows and this leads to over- or under-retention - which means increased risk and cost. For example, in Phillip M Adams & Associates LLC v. Dell Inc , Adams alleged that the defendant (ASUS) had spoliated relevant evidence. The defendant stated that "its email servers are not designed for archival purposes, and employees are instructed to locally preserve any emails of long term value." The court denied the defendant's safe harbor claim and imposed sanctions, stating "the culpability in this case appears at this time to be founded in [the defendant's] questionable information management practices." It was this case where the court wrote that "information management policies are not a dark or novel art." While the ruling in the Adams case is the subject of significant debate, the case has focused attention on information management processes and how organizations approach this issue.

Establishing e-discovery preparedness starts with a partnership between the legal function and IT, HR and records management groups working to identify the most relevant information, which typically means financial data and sales & customer information. A key priority has to be quickly getting to a broad-brush policy and getting information under control as this offers the fastest, greatest return. A complete, detailed set of policies comes through refinement: don't let the perfect policy be the enemy of good, basic information governance. Once information is under automated control - such as in an active archive - basic retention and deletion policies can be implemented and changed as policies are refined, and information becomes quickly accessible for discovery. If all you have is a great policy and no control, then cost and risk remains until you have a scalable, automated method to control email, files and other "unstructured" information.

When forming a partnership with IT, it helps to put yourself in their shoes. While the cost of active archiving can often be recouped in a single case (think County of Suffolk's $400-900k backup tape restoration), it also offers immediate benefits to IT too. Unstructured information, such as email, instant messages and files, is the fastest-growing storage area in IT according to industry analyst IDC. Because information is growing far faster than storage unit costs are declining, the net effect is that storage purchases are consuming more and more of the IT budget, squeezing out other projects. Active archiving systems can dramatically reduce storage requirements because they can identify duplication or redundancy and eliminate it. Consider the email with a three-megabyte attachment that is received by 100 people, saved to disk 50 times and uploaded 10 times. A modern active archiving system will store that attachment just once, and then simply record that the same file is being used the remaining 159 times. Customers of Symantec's active archiving system report 40-80 percent storage reduction this way, bringing the return on investment under the magic 12-month figure - before any e-discovery cost reduction is included.

Active email archiving also provides automated mechanisms for ensuring legal hold. The alternative is to try and hold the item "in place," such as on a laptop or hard drive. While this sounds simple to do, any kind of accidental or deliberate loss means the item is gone and the duty to preserve has been violated. By keeping a central, managed copy of the item in the archive, it can be secured and stored until it is no longer required, and then automatically deleted.

Active archives also dramatically accelerate early case assessment and review. Because the information is already indexed, archives are easily searchable, so there's no need for backup restores or outsourcing of collection and review. This immediate access to information allows in-house and outside counsel to make strategy decisions about a matter before undertaking more expensive and time-consuming discovery efforts.This immediate access to information allows in-house counsel and outside counsel to make strategy decisions about a matter before undertaking more expensive and time-consuming discovery efforts.

In summary, courts are becoming less and less tolerant of excuses made during the discovery process that derive from a failure to maintain proper information management processes. Backups were never designed for e-discovery, yet many organizations still rely upon them for information retrieval - and pay the price in time and money. The benefits of active archiving outweigh the costs and risks of the status quo. While there is no one perfect answer, an e-discovery preparedness partnership between legal, IT and records teams can quickly identify basic information management approaches, and then implement those approaches to dramatically cut storage costs, discovery costs and information risk. The net? You don't have to have the wisdom of Dumbledore and a magic wand to deal with e-discovery cost-effectively and quickly.

Mathew Lodge is Senior Director of Product Marketing for Symantec's Information Management group, and before that ran EMEA Product Marketing for Symantec. Previously, he was an expert consultant in a $400m Silicon Valley trade secret lawsuit, and before that led marketing and product management at venture-funded start-ups in the San Francisco area. Prior to that, he was responsible for product management of a $600m router business at Cisco Systems. He holds a master's degree from the University of York, UK and is an alumni of London Business School.

Please email the author at with questions about this article.