PDF Compression: How to Reduce File Size Without Losing Quality
Β· 11 min read
Why PDFs Get Large
A PDF is a container format that can hold text, images, fonts, vector graphics, JavaScript, multimedia, and metadata. File size bloat usually comes from a few specific sources:
| Source | Typical Impact | Example |
|---|---|---|
| High-res images | 60-90% of file size | A single 300 DPI photo can be 5-15 MB |
| Embedded fonts | 200 KB - 5 MB per font | CJK fonts can exceed 10 MB each |
| Uncompressed streams | 2-5x larger than needed | Text and vector data without Flate compression |
| Duplicate resources | Variable | Same image embedded on every page instead of referenced once |
| Metadata and thumbnails | 100 KB - 2 MB | Page thumbnails, XMP metadata, document history |
| Incremental saves | 10-50% overhead | Each save appends changes instead of rewriting |
Use our PDF Info tool to analyze what is consuming space in your file.
Compression Methods
Image Downsampling
The most effective compression technique. Downsampling reduces image resolution β a 300 DPI image downsampled to 150 DPI becomes roughly 1/4 the pixel count.
- Bicubic downsampling β Best quality, averages pixel neighborhoods
- Average downsampling β Faster, slightly lower quality
- Subsampling β Fastest, picks nearest pixel (can look blocky)
Image Recompression
| Format | Type | Best For | Quality |
|---|---|---|---|
| JPEG | Lossy | Photos, scans | Good at quality 75-85 |
| JPEG2000 | Lossy/Lossless | High-quality photos | Better than JPEG at same size |
| JBIG2 | Lossy/Lossless | Black & white text/scans | 10-30x smaller than CCITT |
| Flate (ZIP) | Lossless | Screenshots, diagrams | Perfect, moderate compression |
| CCITT Group 4 | Lossless | B&W fax-quality scans | Perfect for 1-bit images |
Font Subsetting
A full font file contains thousands of glyphs. If your document uses only 50 characters, subsetting removes the unused glyphs. A 2 MB font might shrink to 30 KB after subsetting.
Content Stream Compression
PDF content streams (text positioning, vector drawing commands) can be compressed with Flate/Deflate. Uncompressed streams are surprisingly common in PDFs generated by older software.
Object Deduplication
When the same image appears on multiple pages, it should be stored once and referenced by each page. Some PDF generators embed a separate copy per page β deduplication fixes this.
Lossy vs Lossless Compression
| Aspect | Lossless | Lossy |
|---|---|---|
| Quality | Identical to original | Some degradation |
| Size reduction | 10-40% | 50-90% |
| Techniques | Flate, font subset, dedup, metadata removal | Image downsample, JPEG recompress, JBIG2 |
| Best for | Legal docs, archival, print-ready | Web, email, screen viewing |
| Reversible | Yes | No |
For most use cases, a combination works best: lossless techniques first (metadata removal, stream compression, deduplication), then lossy image compression if more reduction is needed.
Recommended Settings by Use Case
| Use Case | Image DPI | JPEG Quality | Target Size |
|---|---|---|---|
| Screen / Web viewing | 72-96 DPI | 60-75 | Smallest possible |
| Email attachment | 96-150 DPI | 70-80 | Under 5 MB |
| Office printing | 150-200 DPI | 80-85 | Moderate |
| Professional print | 300 DPI | 90-95 | Quality priority |
| Archival (PDF/A) | Original | Lossless only | No size target |
Try our PDF Compressor β it processes files entirely in your browser with no upload required.
Ghostscript Commands
Ghostscript is the most powerful free tool for PDF compression. The -dPDFSETTINGS flag provides preset compression levels:
# Screen quality (72 DPI, smallest)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/screen -dNOPAUSE -dBATCH \
-sOutputFile=output.pdf input.pdf
# Ebook quality (150 DPI, good balance)
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook \
-dNOPAUSE -dBATCH -sOutputFile=output.pdf input.pdf
# Printer quality (300 DPI)
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer \
-dNOPAUSE -dBATCH -sOutputFile=output.pdf input.pdf
# Custom: 150 DPI with JPEG quality 80
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dDownsampleColorImages=true -dColorImageResolution=150 \
-dJPEGQ=80 -dNOPAUSE -dBATCH \
-sOutputFile=output.pdf input.pdf
Python Libraries
pikepdf (recommended)
import pikepdf
# Open and save with optimization
pdf = pikepdf.open('input.pdf')
pdf.save('output.pdf',
compress_streams=True,
object_stream_mode=pikepdf.ObjectStreamMode.generate,
normalize_content=True,
linearize=True # optimize for web viewing
)
PyPDF2
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader('input.pdf')
writer = PdfWriter()
for page in reader.pages:
page.compress_content_streams()
writer.add_page(page)
writer.add_metadata(reader.metadata)
with open('output.pdf', 'wb') as f:
writer.write(f)
Compression Comparison Table
| Preset | DPI | 10 MB Photo PDF | 50 MB Report | Quality Loss |
|---|---|---|---|---|
| /screen | 72 | ~0.5 MB (95%β) | ~5 MB (90%β) | Noticeable on zoom |
| /ebook | 150 | ~1.5 MB (85%β) | ~12 MB (76%β) | Slight on close inspection |
| /printer | 300 | ~4 MB (60%β) | ~25 MB (50%β) | Minimal |
| /prepress | 300+ | ~7 MB (30%β) | ~40 MB (20%β) | None visible |
| Lossless only | Original | ~8 MB (20%β) | ~35 MB (30%β) | None |
Practical Tips
- Check image resolution first β Use PDF Info to see embedded image DPI. A 600 DPI scan viewed on screen wastes 90% of its pixels.
- Convert RGB to sRGB β CMYK images are 33% larger than RGB equivalents. Convert to sRGB unless printing professionally.
- Remove hidden layers β CAD exports and Illustrator files often contain hidden layers that add significant size.
- Flatten transparency β Complex transparency effects increase file size and rendering time.
- Strip metadata β Remove XMP metadata, document history, and page thumbnails.
- Linearize for web β "Fast web view" reorganizes the PDF so the first page loads before the entire file downloads.
- Audit fonts β Check for fully embedded fonts that could be subsetted, or system fonts that don't need embedding at all.
Frequently Asked Questions
Does compressing a PDF reduce quality?
It depends on the method. Lossless compression (removing metadata, optimizing streams) preserves quality perfectly. Lossy compression (downsampling images, JPEG recompression) reduces quality but achieves much smaller sizes. Choose based on your use case.
What is a good PDF file size for email?
Under 5 MB for most email providers. Gmail allows up to 25 MB, but many corporate servers limit to 10 MB. Use screen or ebook quality compression (72-150 DPI) for email attachments.
Why is my PDF so large?
Common causes: high-resolution images (300+ DPI), fully embedded fonts (especially CJK), uncompressed content streams, duplicate resources across pages, and incremental saves that append data instead of rewriting.
Can I compress a PDF without software?
Yes. Online tools like ThePDF's compressor work entirely in your browser β the file never leaves your device. No installation needed.
What is the difference between PDF optimization and compression?
Compression reduces data size using algorithms (Flate, JPEG). Optimization is broader: it includes compression plus removing unused objects, deduplicating resources, subsetting fonts, and linearizing for web viewing.