Topiary Discovery LLC

Powered by: Topiary Discovery LLC ... The Home of Predictive Pruning®

Sunday, October 30, 2011

The Google v . Oracle Email Slip: Case in Point for Creative Analytics & Work Flow


 
 PC World  and other media outlets reported on the legal wranglings surrounding the accidental production of critical emails that should have (apparently and clearly) been withheld as protected by an attorney/client communication.

According to the article, earlier versions of what would be considered in litigation circles “hot document" emails were inadvertently produced.  Only the final version which had been labeled as privileged was withheld.  The article paints a dim picture for Google’s efforts to claw them back.

Details as to how this happened are sketchy but it appears that the screen employed to identify privileged documents consisted of some combination of “attorney client privilege” content search verbiage OR certain parties identified in the email metadata.   

That’s pretty standard; but the error, and its consequence, serve as good learning opportunities on two fronts.

First, the slip establishes a place where technology can not only drive efficiency, but also reduce  the risk (rather than create risk – the normal concern) of human fault in process design or operation.  At least when used with a work flow that optimizes the technology’s application. 

Here’s a suggested work flow to prevent this type of hazard:

  • Create near-duplicate cluster IDs for all documents in the collection set. 
  • Conduct the standard privilege tests.
  • Collect not only the documents that were “hits” in that set but also all documents (and family members) that match any near-duplicate cluster in the hit set.

Once this is done, the policy decisions can be made as to whether to simply withhold the additional documents, or to review them in light of the privileged member in the cluster.

In this way, there can be no inadvertent disclosures of documents similar in content to privileged items. In addition, a privilege reviewer can make better privilege judgments by allowing him/her to consider all of the documents in context with the similar “privileged” document.

For the second lesson, assume that the similar hot documents were not privileged at all.  And assume that predictive coding with random sample testing was employed to cull out a subset of documents for review and possible production.  And assume that all of the similar documents were not identified by the predictive engine (and if one was missed, likely all would be).   And assume that no obtained random sample set contained one or more of these emails (again likely given the small number of emails).    

Learning question: if the document did come to light in some fashion, how defensible would the testing process appear to a court if the testing failed to identify the small set of critical documents that by he judge’s estimate completely turn the case on its axis?  There are ways to formalize the reduction of this risk, but it takes more than random sampling.

No comments:

Post a Comment