Posted · 8 min read
How to Compress PDFs Without Losing Quality
A practical, no-nonsense guide to shrinking PDF files: what actually takes up space, which compression knobs matter, and how to keep text crisp and signatures legal.
You hit "Send," the email bounces, and the reason is always the same: the PDF is too big. The attachment limit is 25 MB, your file is 38 MB, and you have ten minutes before the meeting starts. Sound familiar? Almost everyone who works with PDFs has lived this moment, and almost everyone reaches for the first "compress PDF online" link they can find, prays it doesn't mangle anything important, and sends.
This guide is the longer answer. It explains what actually makes a PDF large, what the different "compression levels" you see in tools really do under the hood, and when each one is safe to use. By the end you should be able to look at a PDF, predict roughly how much smaller it can get, and pick the right approach without rolling the dice.
Why PDFs get so big in the first place
A PDF is essentially a container. Inside it you can find text streams, vector graphics, embedded fonts, raster images, form fields, annotations, JavaScript, attached files, and metadata. When people complain that their PDF is huge, the culprit is almost never the text. Plain text is astonishingly small: an entire novel encodes to a few hundred kilobytes. The bloat comes from three places.
First, embedded raster images, especially scans. A single full-page color scan at 300 DPI is roughly 25 megapixels. Stored uncompressed that is around 75 MB; even with reasonable JPEG compression it can still be 2-4 MB per page. Multiply by a 30-page document and you have a 100 MB PDF. Second, embedded fonts. A modern OpenType font with full Unicode coverage can be 1-3 MB on its own; a deck that uses six font families balloons quickly. Third, redundant or never-cleaned-up objects: revision history, deleted images that were not actually purged, duplicated copies of the same logo on every page.
Knowing which of these dominates your file is the single most useful diagnostic step. A scan-heavy PDF and a slide-export PDF are not the same problem and do not respond to the same fix.
The four real compression techniques
When a PDF compressor advertises "smart compression" or "AI-powered shrinking," it is almost always doing some combination of four well-known operations. Understanding them lets you predict the outcome instead of guessing.
- Image down-sampling. Reducing pixel dimensions of embedded images. A 300 DPI scan resampled to 150 DPI cuts pixel count by 4x, which usually cuts file size by close to 4x. Lossy in the strict sense (you cannot get pixels back) but often invisible on screen.
- Image re-encoding. Switching the codec or quality setting: an uncompressed bitmap to JPEG, or a JPEG quality 95 to JPEG quality 75. Big gains, with quality loss that ranges from imperceptible to obvious depending on how aggressive you go.
- Font subsetting and de-duplication. Embedding only the glyphs the document actually uses, and merging duplicate font copies. Lossless. A document that embeds three full fonts can drop several megabytes here without changing a single pixel.
- Object stream compression and cleanup. Removing orphan objects, compressing internal streams with Flate (zlib), merging identical resources. Fully lossless and almost free in terms of risk.
Lossless vs. lossy: pick the right tool for the document
Compression is either lossless (every byte you save is recoverable, the visible output is bit-for-bit identical) or lossy (you trade some fidelity for size). The trick is matching the technique to the document's job.
A signed contract, a court filing, a notarized PDF/A archive, an academic paper with mathematical figures: lossless only. A blurred signature scan or a re-JPEGed equation can change a document's legal or scientific meaning. A marketing brochure, a slide deck for an internal meeting, a recipe collection: lossy is fine, often the only way to hit a meaningful size target.
If you cannot easily tell whether a document is in the "never touch the pixels" category, default to lossless. The savings are smaller but you cannot get burned.
What real-world numbers look like
Here is a rough guide for what to expect, drawn from typical office documents. Treat these as orientation, not promises.
Document type Original After lossless After moderate lossy
--------------------------------------------------------------------------
Text-only report 10 MB 8 MB 7 MB
Mixed text + 5 photos 18 MB 15 MB 4 MB
Slide deck (PNG-heavy) 40 MB 32 MB 6 MB
Scanned 30-page PDF 50 MB 48 MB 5 MB
Ebook with cover art 12 MB 10 MB 3 MBWhy text-only PDFs barely shrink
If your file is mostly text and you compress it and barely anything happens, you are not doing it wrong. PDF already compresses its content streams with Flate by default. There is not much slack to squeeze out. The only meaningful wins for text-only documents are font subsetting, removing unused metadata, and stripping any forgotten embedded files. Realistic expectation: 15-25% reduction, full stop. Anyone promising 90% compression on a pure text PDF is either re-rasterizing it (turning your crisp text into a blurry image) or lying.
This matters because it changes how you should react. If a 10 MB legal brief refuses to drop below 8 MB, that is the floor. Splitting it into two PDFs is a more honest fix than mangling the text to chase a number.
Why scan-heavy PDFs shrink dramatically
The flip side: a 50 MB scanned document can routinely drop to 5 MB with no visible quality loss for screen reading. Why? Because most scanners default to 300 DPI color, which is overkill for documents you will read on a monitor. 150 DPI is plenty for body text on a screen, and a moderate JPEG quality is invisible at normal zoom. You are not destroying information so much as you are removing information your eyes will never use.
If the document is meant to be printed, hold the line at 200-300 DPI. If it is meant to be emailed and read on a laptop, 150 DPI in grayscale is usually the sweet spot. If it contains tiny handwriting or fine engineering linework, test one page first before committing to the whole batch.
What NOT to do
- Do not compress the same PDF twice with lossy settings. Each pass re-encodes JPEGs, and the artifacts stack. After three rounds your scan looks like it was faxed in 1994.
- Do not run lossy compression on signed legal documents, contracts, or anything destined for a court filing. Even subtle pixel changes can void the document's evidentiary value, and visible signature degradation looks suspicious.
- Do not OCR a scan, then compress lossily, then OCR again. The second OCR pass on the degraded image will produce worse text and you will have lost the original.
- Do not upload sensitive PDFs (medical records, NDAs, tax returns) to random web compressors. Many keep your file on their servers; some explicitly grant themselves rights to it. Read the terms or use a tool that runs locally in your browser.
- Do not assume that smaller is always better. A 200 KB PDF that no one can read because the OCR text was thrown away is worse than a 5 MB PDF that searches correctly.
A simple decision tree
When you have a PDF you need to shrink, walk through these questions in order. Ninety percent of the time the answer falls out within thirty seconds.
- Is the document legally sensitive (signed, notarized, official filing)? Lossless only. Stop here.
- Is the file mostly text with no large images? Run lossless cleanup; expect 15-25% savings; consider splitting if you need more.
- Is the file dominated by scans or photos and meant for screen reading? Down-sample to 150 DPI and re-encode at moderate JPEG quality. Expect 60-90% savings.
- Is the file a slide deck or marketing PDF with PNG screenshots? Convert PNGs to JPEG where the content is photographic; keep PNG where there is sharp text or line art.
- Is the file destined for print? Keep images at 200-300 DPI minimum. Lossless cleanup only.
How browser-based compression compares
Most online PDF compressors upload your file, process it on a server, and send a smaller version back. That works, but it has two costs: your document leaves your machine, and you wait through a round-trip on every change. Browser-based tools (including Multilities' /tools/pdf-compress) do the work locally using WebAssembly. Nothing is uploaded, the response is instant on small files, and you can compress a folder of receipts on the train without burning mobile data.
Browser compression has tradeoffs too: very large PDFs (a few hundred megabytes) can strain a phone's memory, and the heaviest re-encoding pipelines run a touch slower than a beefy server. For the everyday 5-50 MB range that covers most real-world documents, the local approach is faster end-to-end once you count upload time, and your data never leaves the device.
Specific tactics that punch above their weight
If you want a few quick wins that work on almost any PDF, these are the highest-leverage tweaks.
- Strip embedded thumbnails. Some PDF generators bake a thumbnail of every page into the file. They add up fast on long documents and almost no modern viewer needs them.
- Remove unused form fields and JavaScript. Old form templates often carry inert scripts and definitions for fields nobody filled in.
- Flatten annotations and comments. If you do not need to keep editing them, flattening turns them into part of the page and lets the cleanup pass remove the underlying objects.
- Convert color scans to grayscale when color is not informative. A grayscale scan at the same DPI is roughly one third the size of color.
- Re-export from the source. If the original file is a Word document or a Keynote deck, exporting fresh with "Smallest size" or "Reduced quality" settings often beats anything you can do to an already-bloated PDF.
What a good compressor's settings actually mean
Most tools expose three or four levels: Low, Medium, High, Extreme, or sometimes friendlier names like "Print quality," "Screen quality," "Email-ready." Translated into the four techniques above, they usually map like this. "Low" or "Print" runs lossless cleanup only and preserves images at 300 DPI. "Medium" or "Screen" down-samples to around 150 DPI and re-encodes JPEGs at quality 80. "High" or "Email" pushes to 96-120 DPI and JPEG quality 60. "Extreme" can drop to 72 DPI and quality 40, where text in scans starts to look mushy.
If you find the labels confusing, run one trial page at the highest setting first and look at it carefully. If the trial looks fine, you can usually trust the same setting on the rest of the document.
Putting it together
Compressing a PDF well is mostly about matching the technique to the document. A signed contract gets lossless cleanup. A 50 MB scanned report gets aggressive down-sampling. A text-heavy academic paper gets accepted as is, because there is nothing left to squeeze. The tools that work best are the ones that let you pick a level and tell you honestly what they did.
The next time your email bounces, take ten seconds to ask what kind of PDF you actually have. Then pick the right setting once, instead of running it through five different compressors and ending up with a blurry, watermarked version of a document that was probably fine after a single careful pass. If you want a no-upload option that runs in the browser and shows you the before/after size before you commit, Multilities' PDF compress tool is built around exactly this workflow. Either way, the ideas are the same: know what is making your file big, pick the lightest tool that solves it, and stop when the job is done.