PDF Compression: How to Reduce File Size Without Losing Quality

Β· 11 min read

Why PDFs Get Large

A PDF is a container format that can hold text, images, fonts, vector graphics, JavaScript, multimedia, and metadata. File size bloat usually comes from a few specific sources:

SourceTypical ImpactExample
High-res images60-90% of file sizeA single 300 DPI photo can be 5-15 MB
Embedded fonts200 KB - 5 MB per fontCJK fonts can exceed 10 MB each
Uncompressed streams2-5x larger than neededText and vector data without Flate compression
Duplicate resourcesVariableSame image embedded on every page instead of referenced once
Metadata and thumbnails100 KB - 2 MBPage thumbnails, XMP metadata, document history
Incremental saves10-50% overheadEach save appends changes instead of rewriting

Use our PDF Info tool to analyze what is consuming space in your file.

Compression Methods

Image Downsampling

The most effective compression technique. Downsampling reduces image resolution β€” a 300 DPI image downsampled to 150 DPI becomes roughly 1/4 the pixel count.

Image Recompression

FormatTypeBest ForQuality
JPEGLossyPhotos, scansGood at quality 75-85
JPEG2000Lossy/LosslessHigh-quality photosBetter than JPEG at same size
JBIG2Lossy/LosslessBlack & white text/scans10-30x smaller than CCITT
Flate (ZIP)LosslessScreenshots, diagramsPerfect, moderate compression
CCITT Group 4LosslessB&W fax-quality scansPerfect for 1-bit images

Font Subsetting

A full font file contains thousands of glyphs. If your document uses only 50 characters, subsetting removes the unused glyphs. A 2 MB font might shrink to 30 KB after subsetting.

Content Stream Compression

PDF content streams (text positioning, vector drawing commands) can be compressed with Flate/Deflate. Uncompressed streams are surprisingly common in PDFs generated by older software.

Object Deduplication

When the same image appears on multiple pages, it should be stored once and referenced by each page. Some PDF generators embed a separate copy per page β€” deduplication fixes this.

Lossy vs Lossless Compression

AspectLosslessLossy
QualityIdentical to originalSome degradation
Size reduction10-40%50-90%
TechniquesFlate, font subset, dedup, metadata removalImage downsample, JPEG recompress, JBIG2
Best forLegal docs, archival, print-readyWeb, email, screen viewing
ReversibleYesNo

For most use cases, a combination works best: lossless techniques first (metadata removal, stream compression, deduplication), then lossy image compression if more reduction is needed.

Recommended Settings by Use Case

Use CaseImage DPIJPEG QualityTarget Size
Screen / Web viewing72-96 DPI60-75Smallest possible
Email attachment96-150 DPI70-80Under 5 MB
Office printing150-200 DPI80-85Moderate
Professional print300 DPI90-95Quality priority
Archival (PDF/A)OriginalLossless onlyNo size target

Try our PDF Compressor β€” it processes files entirely in your browser with no upload required.

Ghostscript Commands

Ghostscript is the most powerful free tool for PDF compression. The -dPDFSETTINGS flag provides preset compression levels:

# Screen quality (72 DPI, smallest)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH \
   -sOutputFile=output.pdf input.pdf

# Ebook quality (150 DPI, good balance)
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dBATCH -sOutputFile=output.pdf input.pdf

# Printer quality (300 DPI)
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer \
   -dNOPAUSE -dBATCH -sOutputFile=output.pdf input.pdf

# Custom: 150 DPI with JPEG quality 80
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dDownsampleColorImages=true -dColorImageResolution=150 \
   -dJPEGQ=80 -dNOPAUSE -dBATCH \
   -sOutputFile=output.pdf input.pdf

Python Libraries

pikepdf (recommended)

import pikepdf

# Open and save with optimization
pdf = pikepdf.open('input.pdf')
pdf.save('output.pdf',
    compress_streams=True,
    object_stream_mode=pikepdf.ObjectStreamMode.generate,
    normalize_content=True,
    linearize=True  # optimize for web viewing
)

PyPDF2

from PyPDF2 import PdfReader, PdfWriter

reader = PdfReader('input.pdf')
writer = PdfWriter()

for page in reader.pages:
    page.compress_content_streams()
    writer.add_page(page)

writer.add_metadata(reader.metadata)

with open('output.pdf', 'wb') as f:
    writer.write(f)

Compression Comparison Table

PresetDPI10 MB Photo PDF50 MB ReportQuality Loss
/screen72~0.5 MB (95%↓)~5 MB (90%↓)Noticeable on zoom
/ebook150~1.5 MB (85%↓)~12 MB (76%↓)Slight on close inspection
/printer300~4 MB (60%↓)~25 MB (50%↓)Minimal
/prepress300+~7 MB (30%↓)~40 MB (20%↓)None visible
Lossless onlyOriginal~8 MB (20%↓)~35 MB (30%↓)None

Practical Tips

  1. Check image resolution first β€” Use PDF Info to see embedded image DPI. A 600 DPI scan viewed on screen wastes 90% of its pixels.
  2. Convert RGB to sRGB β€” CMYK images are 33% larger than RGB equivalents. Convert to sRGB unless printing professionally.
  3. Remove hidden layers β€” CAD exports and Illustrator files often contain hidden layers that add significant size.
  4. Flatten transparency β€” Complex transparency effects increase file size and rendering time.
  5. Strip metadata β€” Remove XMP metadata, document history, and page thumbnails.
  6. Linearize for web β€” "Fast web view" reorganizes the PDF so the first page loads before the entire file downloads.
  7. Audit fonts β€” Check for fully embedded fonts that could be subsetted, or system fonts that don't need embedding at all.

Frequently Asked Questions

Does compressing a PDF reduce quality?

It depends on the method. Lossless compression (removing metadata, optimizing streams) preserves quality perfectly. Lossy compression (downsampling images, JPEG recompression) reduces quality but achieves much smaller sizes. Choose based on your use case.

What is a good PDF file size for email?

Under 5 MB for most email providers. Gmail allows up to 25 MB, but many corporate servers limit to 10 MB. Use screen or ebook quality compression (72-150 DPI) for email attachments.

Why is my PDF so large?

Common causes: high-resolution images (300+ DPI), fully embedded fonts (especially CJK), uncompressed content streams, duplicate resources across pages, and incremental saves that append data instead of rewriting.

Can I compress a PDF without software?

Yes. Online tools like ThePDF's compressor work entirely in your browser β€” the file never leaves your device. No installation needed.

What is the difference between PDF optimization and compression?

Compression reduces data size using algorithms (Flate, JPEG). Optimization is broader: it includes compression plus removing unused objects, deduplicating resources, subsetting fonts, and linearizing for web viewing.

Related Tools

Compress PDF Merge PDF Split PDF PDF Info