Services

Near Duplication & Threading

Document review teams spend too much time reviewing the multiple replies to the Christmas party email or reviewing the various nearly duplicative versions of the party invite. Lighthouse’s EmailSmart and DupSmart solutions can eliminate this wasted time - reducing review time and costs by 20% - 50%.

Unlike exact duplicate documents, nearly duplicative documents cannot be removed from document populations with MD5 HASH values. Because they are found across the document population, without Smart technology, it is impossible to group them together for one reviewer to view. With the use of DupSmart, Lighthouse groups near-duplicates so that a reviewer can look at only one version of a document as well as the differences between it and the similar documents. The reviewer can then code the nearly duplicative set of documents consistently and in one fell swoop. Given that near duplicate rates are between 20% and 40% of a document population, eliminating full review of these documents by using Lighthouse’s DupSmart can significantly improve review speeds and coding consistency.

Similar to DupSmart, Lighthouse’s proprietary approach for threading, called EmailSmart, groups like information together to greatly reduce the amount of documents to review. For instance, typical email usage produces a large number of messages that are simply back-and-forth conversations, where each subsequent email contains the entire earlier content. Threading analyzes a set of emails and then identifies which emails completely contain content also found in an earlier conversation string. By identifying the most inclusive email, as a single point of review, 20% - 50% of emails can be excluded from review. With EmailSmart, our clients save time and money.