Text Similarity Checker
Paste two pieces of text and see exactly how much they overlap. The tool uses n-gram phrase matching to highlight repeated phrases and report a similarity percentage. Runs entirely in your browser — nothing is uploaded.
Note: this is a text-vs-text comparison tool. It does not search the internet for sources — for that, you need a paid service like Turnitin, Copyscape, or Quetext.
Similarity Report
What this tool does (and what it doesn't)
This is a two-text comparison tool. You paste two passages — for example, an original source and a draft you want to check — and the tool reports how much they overlap. It's the right tool when you already know what you're comparing against: a student essay against the source article, a rewritten paragraph against the one it was paraphrased from, or two revisions of the same document.
It is not a web-scanning plagiarism service. It will not search Google, Turnitin's database, or academic repositories for matches. If you need that, Turnitin, Copyscape, Quetext, and Grammarly's premium scanner all offer it — usually at a cost per scan or a monthly subscription.
How the n-gram matching works
Both texts are tokenised into lowercase words with punctuation stripped. The tool then builds every overlapping phrase of length N (default 4 words) from each text and compares the two sets. The similarity percentage is the share of Text B's phrases that also appear in Text A — sometimes called the containment coefficient, a close relative of Jaccard similarity.
The match sensitivity dropdown changes N. A smaller N (3-word phrases) is more sensitive — it will flag even short reused fragments, but is more likely to report incidental overlap on common phrasing like "as a result of" or "in the United States". A larger N (5-word phrases) is stricter — it will only flag passages that copy genuinely distinctive wording.
Reading the result
- Under ~10% at default sensitivity is typical for two unrelated texts on the same topic — they'll share some connective phrases by coincidence.
- 10–30% suggests the author was working from the source but rephrased most of it.
- Above ~40% is a strong signal of direct reuse or light paraphrasing; the highlighted phrases will make it clear where.
- A high score on a short text (say, under 100 words) can be misleading. The shorter the sample, the more random overlap you'll see.
Good uses
- Checking a rewrite against the original to make sure you actually paraphrased rather than shuffled the words.
- Diffing two drafts of the same article to see what the editor changed.
- Spot-checking a student's answer against the textbook passage you suspect it came from.
- Catching self-plagiarism when you're reusing chunks across your own old posts.
Privacy
Both texts stay on your device. The entire comparison — tokenisation, n-gram extraction, highlighting — runs in your browser with JavaScript. Nothing is sent to our server, logged, or stored. You can verify this by opening your browser's network tab and clicking Compare; you'll see no outbound request.