By Jennifer Kennedy Park and Scott Reents, Cleary Gottlieb Steen & Hamilton LLP
The protocol Judge Peck approved generally followed the outline above, but it also included provisions to increase transparency and encourage cooperation between the parties. Under the terms of the protocol, counsel for the producing party—the defendant—is required to share with plaintiffs' counsel all non-privileged documents (whether responsive or not) and the coding from the initial seed set of documents and the seven subsequent rounds of review (of 500 documents each) that will be used to train the computer.5 Plaintiffs' counsel is permitted to review these documents and provide the defendant with its own evaluation of the coding.6 The parties are expected to attempt to resolve any disputes related to the coding of documents;7 however, Judge Peck is available to resolve any intractable conflicts.8 Documents identified as responsive by the computer will be subject to further review by defendants prior to production,9 but documents identified as not responsive by the computer will receive no human review and will not be produced.
As Judge Peck recognized, predictive coding can be a valuable tool in civil litigation. First, in terms of production accuracy, predictive coding “works better than most of the alternatives, if not all of the [present] alternatives.”10 Indeed, predictive coding “‘can (and does) yield more accurate results than exhaustive manual review, with much lower effort.’”11 Predictive coding has also been shown to be more accurate than keyword searches.12 Related to its improved accuracy, predictive coding has the potential to reduce the costs of complying with discovery requests that involve the production and review of large amounts of ESI.13
Predictive coding promises similar advantages in regulatory enforcement matters. Document review and production is generally a significant—if not the predominant—component of such matters at the initial stages. Thus, the technology presents an opportunity to reduce the cost of responding to requests from regulatory authorities at least as much, if not more, than in civil litigation. However, regulatory enforcement proceedings are not civil litigation, and the significant differences between the two should be considered before deciding to use predictive review as an aid to responding to regulatory requests.
Abbreviated timelines can also complicate predictive coding insofar as they require rolling collections and productions of documents. When production deadlines are short, parties will often need to begin review before the full collection of documents is complete. Many predictive coding platforms work less well when collections are loaded on a rolling basis because the early training of the computer is based on an incomplete set of data. When documents are later added to the collection, additional training must be undertaken to account for the new documents. Rolling productions, which are also a common way of responding to regulatory requests, present a similar problem. If a regulator wants certain custodians or time periods prioritized for review and production, the predictive coding process may need to be run separately for each prioritized set, increasing the amount of time spent training the computer and complicating the overall workflow.
All of that said, predictive coding can be useful in regulatory investigations even if it does not, at the end of the day, reduce the time or person-hours it takes to prepare productions because predictive review can shorten the amount of time needed for counsel to become familiar with the facts. The time it takes to become familiar with the underlying facts in a matter and identifies critical documents is particularly important in enforcement matters. Quickly learning the facts of a matter puts counsel in a better position to negotiate with a regulator about the size and scope of the production request, focusing its attention on the relevant time periods, custodians, and issues, and potentially limiting overly broad requests.
Similarly, regulators are almost certain to want to sign off on the use of predictive coding before permitting its use to narrow the set of documents reviewed and produced. Thus, the use of predictive coding for production is essentially impossible without a regulator who understands and trusts the technology and is informed enough to vet a proposed methodology. As the technology is still in its infancy, many regulators may be hesitant to authorize use of a technology with which they do not have significant experience. On the other hand, some regulators are already agreeing to the use of predictive coding in specific matters, and one government agency has publicly acknowledged the importance of this and other new technologies in modern review. In proposed revisions to its rules, the Federal Trade Commission states that the parties responding to its requests may potentially “utilize one or more search tools such as advanced key word searches, Boolean connectors, Bayesian logic, concept searches, predictive coding, and other advanced analytics.”14
Regulators who are open to the use of predictive coding may require assent not only to the decision to use predictive coding, but also to the specific methodological details, such as how the seed set is generated, how many training iterations are used, and what sampling is done to confirm the accuracy of the review. Regulators may go further and seek involvement similar to that permitted of the plaintiffs in the Da Silva Moore protocol—the right to review and challenge the producing party's coding of specific documents. This level of transparency could make regulators more comfortable with the review process because it exposes the criteria counsel uses to distinguish responsive from non-responsive documents. That said, this level of transparency, which is not typical in a linear review, comes with risks for producing parties, including the potential expansion of the regulator's investigation and document requests into new areas as a result of reviewing the non-responsive documents in the seed sets.15
Finally, a producing party must consider the requirement by many regulators that a party certify the completion of its document production. In civil litigation, an agreement by the opposing party to a particular search methodology is effectively an acknowledgement that such a methodology satisfies the obligations imposed by the relevant rules of civil procedure, which typically boil down to a reasonableness standard. By contrast, even if a regulator has agreed up front to permit the use of predictive review, a regulator is unlikely to concede the sufficiency of the methodology for purposes of a producing party certifying that the production is complete. Regulators have substantial discretion over whether to certify a production, and even a preliminary decision not to certify completion could cause significant delay in the resolution of an investigation.16 Even where counsel procures a regulator's prior agreement to the use of the technology and agreement that certification will be accepted using such technology, counsel (and the client) too must have sufficient understanding of and trust in predictive coding to be comfortable certifying the completeness of its productions.
Where a regulator does not permit a relevance screen on a production, predictive coding clearly has no role to play in determining whether a document should be produced. However, it can still be useful as an efficient way to understand substantively what is being produced. Counsel could, for example, use predictive coding to highlight documents that are likely to be relevant, while still reviewing each and every document that is going to be produced. Or counsel could go a step further and use predictive coding to limit its review only to the documents the computer predicts are relevant, even while producing the larger universe of documents to the regulator. Of course, this means that counsel would be producing documents to a regulator that no human being had actually laid eyes on.
To the extent that counsel is producing documents without human review, there is the risk that the regulator will find documents and facts about which counsel is not fully informed. Counsel taking this approach must therefore have substantial confidence that its technology and process are sound, because the consequences of a mistake are not the typical consequences of a production that contains some irrelevant documents or is missing some relevant ones, but the arguably more damaging consequence of a production that contains relevant (and potentially important) documents about which counsel is entirely unaware. On the other hand, there is always a risk that human reviewers will miss or misunderstand the significance of important documents. Indeed, as Da Silva Moore notes, there is evidence that predictive review is actually more accurate than traditional human review, so the use of predictive coding may not necessarily increase the risk of missing important documents.19 Nevertheless, counsel should proceed cautiously when pursuing a strategy of producing documents without any human review.
While a regulator may not permit a relevance screen on a production, it generally does (and must) permit a screen for privileged material. Predictive coding has not been as rigorously tested as a device for identifying potentially privileged documents. Since the technology works by identifying documents that are topically similar to one another, it may not be as effective at identifying privileged documents, where determinations often turn not on topical similarity of documents, but rather on very specific (and subtle) contextual differences between documents, such as whether a lawyer is included on a distribution list for purposes of seeking that lawyer's legal advice or for some other, non-legal purpose. While predictive coding is arguably valuable as a means of identifying potentially privileged material, it should probably be paired with more traditional methods of identifying potentially privileged material, such as the use of search terms.
Where documents are being produced without human review, counsel should also consider its client's risk tolerance for inadvertent disclosure of privileged documents. Federal Rule of Evidence 502 limits the risk that an inadvertent disclosure to a federal regulator will result in a subject-matter waiver, removing the threat of perhaps the most damaging consequence of an inadvertent disclosure.20 Nevertheless, inadvertent disclosure could waive privilege with respect to the documents disclosed and, regardless, could reveal sensitive information that would not have otherwise been shared with the regulator. The risk of inadvertent production can be mitigated to some extent if the producing party has the ability to claw back privileged documents after production. However, regulators do not typically enter into claw-back agreements prior to production and some may resist claw-back of inadvertently produced documents entirely. Given these risks, counsel should investigate the legal and ethical duties related to the return or destruction of privileged documents in the pertinent jurisdiction before using predictive coding to conduct a privilege review.
• Consider whether the document volume, timing, and collection logistics will allow for predictive coding. Extremely expedited schedules, especially when combined with rolling collections and productions, may not be ideal situations for the use of predictive review.
• Consider whether the regulator has experience and comfort with predictive coding. A less sophisticated regulator may be less likely to agree to predictive coding and/or more likely to refuse to or delay accepting a certification of completeness.
• Make sure you have enough comfort with predictive coding to be able to make a certification of completeness. Even if the regulator has agreed to predictive coding, you may still be required to certify independently to the efficacy of the methodology.
• Consider what aspects of the methodology will require agreement with the regulator. A highly transparent protocol such as that used in Da Silva Moore could complicate the review and open the door for an expanded inquiry. An alternative protocol might provide agreement on other details of the methodology—numbers, confidence intervals, or general relevance guidelines—to ease any concerns about the technology being a “black box,” while not being as intrusive as the Da Silva Moore protocol.
• When the regulator insists on productions without relevance review, consider other methods in addition to predictive coding to identify privileged documents. Also consider your risk tolerance for inadvertent disclosure and your ability to claw back any inadvertently produced documents.
Jennifer Kennedy Park is a partner based in the New York office of Cleary Gottlieb Steen & Hamilton LLP, where she focuses on white collar defense and corporate investigations, as well as litigation, particularly related to capital markets transactions.
Scott Reents is the ediscovery attorney in the New York office of Cleary Gottlieb Steen & Hamilton LLP, where he advises clients and the firm on electronic discovery law, technology, and best practices.
The authors would like to thank Peter H. Fielding for his invaluable assistance in preparing this article.
This document and any discussions set forth herein are for informational purposes only, and should not be construed as legal advice, which has to be addressed to particular facts and circumstances involved in any given situation. Review or use of the document and any discussions does not create an attorney-client relationship with the author or publisher. To the extent that this document may contain suggested provisions, they will require modification to suit a particular transaction, jurisdiction or situation. Please consult with an attorney with the appropriate level of experience if you have any questions. Any tax information contained in the document or discussions is not intended to be used, and cannot be used, for purposes of avoiding penalties imposed under the United States Internal Revenue Code. Any opinions expressed are those of the author. The Bureau of National Affairs, Inc. and its affiliated entities do not take responsibility for the content in this document or discussions and do not make any representation or warranty as to their completeness or accuracy.
©2014 The Bureau of National Affairs, Inc. All rights reserved. Bloomberg Law Reports ® is a registered trademark and service mark of The Bureau of National Affairs, Inc.
All Bloomberg BNA treatises are available on standing order, which ensures you will always receive the most current edition of the book or supplement of the title you have ordered from Bloomberg BNA’s book division. As soon as a new supplement or edition is published (usually annually) for a title you’ve previously purchased and requested to be placed on standing order, we’ll ship it to you to review for 30 days without any obligation. During this period, you can either (a) honor the invoice and receive a 5% discount (in addition to any other discounts you may qualify for) off the then-current price of the update, plus shipping and handling or (b) return the book(s), in which case, your invoice will be cancelled upon receipt of the book(s). Call us for a prepaid UPS label for your return. It’s as simple and easy as that. Most importantly, standing orders mean you will never have to worry about the timeliness of the information you’re relying on. And, you may discontinue standing orders at any time by contacting us at 1.800.960.1220 or by sending an email to email@example.com.
Put me on standing order at a 5% discount off list price of all future updates, in addition to any other discounts I may quality for. (Returnable within 30 days.)
Notify me when updates are available (No standing order will be created).