Predictive coding is a promising market because discovery always gets budget, but it is hindered in part because of defensibility concerns. Information governance does not carry significant defensibility liability but does suffer from lack of budget. Information governance sales also suffer from the mistaken view that the undertaking must be done at an enterprise level. Much has been made of the logic of transitioning predictive coding from ediscovery alone to information governance. Much of the discourse has been carried out without much thought as to how companies do (or should do) analytics adoption. This is a shame because there really is a persuasive argument for sophisticated content-based analytical solutions that provides better return on expenditure for ediscovery projects while enabling a corporation to begin the path towards information governance in a conservative gradual manner.
Predictive coding, as it stands now, and information governance are distinct beasts. Predictive coding is episodic; information governance, abiding. Predictive coding is reactive; information governance, proactive. Yet the distinction presents the opportunity for synergistic complement. It is not the commonly asserted simple re-positioning of predictive coding for general information governance use. Rather, it contemplates the use of discrete targeted security and ediscovery information governance applications, integrating information governance modules specifically and exclusively with ediscovery and information security solutions. This enables the company to explore analytics value by leveraging available budget to solve multiple bottom-line challenges.
The solution space at the intersection of these two undertakings has some precedent in information security solution proposals. Near real-time protection against exfiltration of valuable intellectual property in the form of unstructured test is one example of an already utilized targeted information governance solution. A proposed architecture for such a protection scheme has been detailed in the information security research community using near-duplicate (cosine similarity analysis). The information security solution entails the use of a text indexing server, a Squid security server and a “content-comparer “ server. The solution provides the ability to block the outbound transmission of items with an index “signature” that is highly similar to an item in the library of signatures of high-value documents.
These features can be imagined to provide the architecture for integrated information governance/ ediscovery predictive analytics solutions, and one that provides the additional information security value. The information security solution would benefit from a more robust document comparison methodology while the ediscovery information governance solution would have a much richer set of input data from which to predict relevance.
In addition, this type of combined security/information governance solution could be leveraged to extend to ediscovery challenges such as the automated identification and management of the disclosure of corporate documents subject to non-disclosure and confidentiality agreements as well as documents covered by privilege or work product protection.
If you would like more information on this type of approach, contact me at firstname.lastname@example.org