As an aside, watching (in transcript form) the Court take some of the shellac off the attorneys brings back memories I had successfully suppressed.
Thanks to Tara Lamy for the Law.com posting which contains the transcript:
Anyone else curious about about the absence of discussion as to whether the document population was normalized with samples gathered from a reduced corpus where only copy from each near-duplicate cluster was represented from the population? Despite all the talk about sample validity.
It will be interesting to see as systems are put into more transparent use whether aggregate samples for multiple issues suffices.
Same open question relating to document type classes (emails vs. spreadsheets vs Word/PDF, etc). Will separate samples be warranted?
And of course, as usual, I argue sample testing based upon binary R/NR classes is inherently inadequate to the diligent exercise of discovery. My reasoning is related to a subtle misstatement by the Court: from end, Page 69 into70:
“25 THE COURT: No, I think what they have said is that
1 once the system is fully trained and run, at some point,
2 undetermined and subject to court approval, they are going to
3 say the likely relevance when you have reached X number is too
Actually, as described therein and in the industry, these approaches say nothing about "smallness" or "largeness" of relevance, only the likelihood or rather the unlikelihood of documents remaining undetected that would, if reviewed, be coded "relevant". Currently, the testing is absolutely silent as to what level of relevance lies within the 5% or 1%.
I would argue that framing "relevance" as a static binary measure does a great disservice to the pursuit of good sound discovery. (See William Webber's response to my comment about the need for relevance levels here.)
It may be that in many cases hot documents will remain undetected and forever unknown. This opacity casts only a false imprimatur on the process. It will survive until, and perhaps this may happen here, a party -- who has received a production from an opposing party -- produces some "kill shot" company documents (e.g. ones their own clients squirreled away) that the semi-automated review process failed to deliver. And then the questions about relevance levels and testing will begin.
It may therefore also be that absent a supplemental approach which identifies and predicatively models for high relevance documents (the high octane ones), defensible discovery has not been accomplished.
Finally, on a related note, it has become de rigueur to note, based I think upon Trec studies and the like, that semi-automated reviews, when done properly (hmm?) out-perform humans. And so from the transcript:
8 THE COURT: It certainly works better than most of the
9 alternatives, if not all of the alternatives. So the idea is
10 not to make this perfect, it's not going to be perfect. The
11 idea is to make it significantly better than the alternative
12 without nearly as much cost.
However, in fact, I don't believe that there have been any studies that have sought to determine predictive engine recall and precision vs. human abilities in identifying specifically that class of items that in the real world move the ball forward, so to speak, or more to be more germane, move the needle on settlement points. Based upon experience, my intuition is that humans do that better, and do it without the intra-team inconsistencies established in studies.