Text Summarizer
Summarize long text by extracting its most important sentences. Uses classic TF‑IDF scoring — no AI model, no upload, runs entirely in your browser.
Summary
Extractive, not abstractive
There are two flavours of summarisation. Abstractive summarisers (like ChatGPT or Claude) read the text and write new sentences that paraphrase the key ideas. Extractive summarisers pick existing sentences from the text that carry the most information, and return them unchanged. This tool is extractive — every sentence in the summary appears verbatim in your input. That has an important consequence: the summary cannot hallucinate. It can't invent a fact the original didn't contain, because it doesn't write anything.
How a sentence gets picked
The algorithm is classic TF‑IDF sentence scoring. First the text is split into sentences. For every meaningful word, the tool computes how often it appears (term frequency) while penalising very common words (inverse document frequency). Stopwords like the, is, and, of are dropped. Each sentence is then scored by summing the TF‑IDF weights of its content words, with a small penalty for very long sentences. The top N highest‑scoring sentences — where N is determined by the length slider — are returned in their original order.
Which inputs work well
- News articles and feature stories — usually have well‑formed lead sentences that score high.
- Research papers and academic prose — topic sentences in each paragraph carry real information density.
- Product documentation, Wikipedia‑style articles, reports — anything where meaning is evenly distributed and sentences are self‑contained.
Which don't
- Dialogue and scripts — meaning is spread across short back‑and‑forth lines that look low‑score individually.
- Narrative fiction — plot lives in the sequence, not in a few topic sentences; extractive summaries read oddly.
- Stream‑of‑consciousness writing, forum posts, or heavily formatted content (bullet lists with one‑word items).
- Languages other than English — the stopword list and word‑boundary assumptions are tuned for English. Spanish and French get a reasonable but noisier result; Chinese, Japanese, and Arabic need different tokenisation and won't work well.
Choosing length
For a news article: Very Short (15%) gives you the headline and lede. Short (25%) is close to a classic "TL;DR". Medium (35%) starts to include context and caveats. Long (50%) is mostly the article minus the anecdotes and quotes. The minimum‑sentences dropdown exists to keep very short inputs from summarising down to a single line.
Privacy
The entire scoring pipeline runs in JavaScript in your tab. No AI model is called, no server touches the text, nothing is logged. Close the tab and the text is gone.