Everybody is concerned with limiting the amount of data that needs to be reviewed. It’s the number one cost in e-discovery after all. Clearly, the best way to conserve costs is to review less and there is a technique that we software engineers use every day that can drastically reduce the amount of documents in review. A concept from CS101 called normalization. Essentially, it’s a mechanism for ensuring the integrity and efficiency of a database and provides a framework for specifying the degree in which redundant data needs to be stored. Our goal is to remove redundant data while providing an elegant means of searching and reviewing hierarchical document sets in context.
So here is the typical example we see in e-discovery review.
Email 1 has two attachments, Attachment 1 and Attachment 2
Email 2 has one attachment, Attachment 1
There are much more complex examples of this as well, but this should illustrate the point just fine. Attachment 1 is a duplicate file, meaning it is literally the exact same file, with exactly the same metadata, and the same md5 hash. The only difference is that in one situation it’s attached to Email 1 and in another it’s attached to Email 2.
Most review applications will keep duplicate copies of the attachment and group with the parent emails, essentially flattening the attachment with the email and creating multiple versions of the same pages. If you choose (or are forced) to review Attachment 1 as two completely different documents in this manor as most common review tools require, you are going to be significantly increasing the time to review, tag, and code that document. Not only that, but you are going to increase the possibility of a redaction not being applied to all versions of that document.
That said, it’s clearly important to look at emails and their attachments together to get the full context of the material. Using the above example, we’ll add titles to each of the docs.
Email 1 Q1 summary has two attachments, Attachment 1 Q1_financials.doc and Attachment 2 Q1_goals.doc
Email 2 Here are the falsified numbers you asked for Jim, Attachment 1 Q1 financials.doc
Even if Email 1 and Email 2 get assigned to different reviewers and they both have to review Attachment 1, the redactions, tagging, and coding information will be available for the other reviewer making the process more efficient than working on that attachment from scratch. You’ve saved yourself a big chunk of time. And in an ideal situation, the same reviewer will already be familiar with that document and see the additional context of both emails during his review. As he’s reviewing Q1_finanicals.doc he’ll know that it was sent out in a Q1 summary email and also in an email announcing that the numbers were falsified. So it’s not only more efficient, it actually sets a greater context for the document and provides additional valuable meta data.
Unfortunately, most review tools that we’ve seen don’t support this normalized relational database technique but our hope is that as more review tools extend to support native file formats this technique for organizing review databases will be available for all. Our simple and powerful review platform maintains the relationships between all documents and allows for flattening or “reduping” on export or production. It’s an approach that really provides the best of both worlds.


