Duplicate Detection for eDiscovery

The Stratify Legal Discovery service quickly identifies duplicate and near-duplicate documents and delivers them together, in context, to accelerate review and ensure consistent tagging.


Duplicates are automatically grouped together to accelerate review.

More Efficient and Consistent Tagging

Reviewers can swiftly examine duplicate and near duplicate documents and tag them individually, or batch-tag them as a group. This consistent tagging of duplicates ensures that disparate actions are not taken, for instance, when one near duplicate is tagged responsive while another is tagged privileged.

Duplicate documents are grouped as follows:

Exact duplicates contain identical content and identical metadata. For example, monthly backup tape copies of a document are exact duplicates of each other. Exact duplicates are usually culled from the reviewable document universe.

Content duplicates contain identical content but different metadata. For example, identical copies of a document located on a personal hard drive and a central file server are content duplicates.

Content duplicates can be retained or removed from the reviewable universe. They are often retained in matters where it is important to understand “who knew what when,” or removed in matters where it is important to identify responsive documents, such as in HSR Second Requests.

Near Duplicates are minor variations of the same document. They contain similar, but slightly altered content. For example, multiple revisions to text documents, minor cell changes in spreadsheets, or emails that are forwarded from one person to another are near duplicates.