PDF Compression: How to Reduce File Size Without Losing Quality

· 12 min read

PDF files have a reputation for ballooning to unwieldy sizes, especially when they contain high-resolution images, embedded fonts, or complex graphics. Whether you're trying to email a document, upload it to a web portal with size restrictions, or simply save storage space, understanding how to compress PDFs effectively is essential.

This comprehensive guide walks you through the technical details of PDF compression, from understanding what makes PDFs large to implementing practical compression strategies that preserve quality. You'll learn about different compression algorithms, command-line tools, and when to use lossy versus lossless techniques.

Table of Contents

Why PDFs Get Large

A PDF is fundamentally a container format that can hold multiple types of content: text, images, fonts, vector graphics, JavaScript, multimedia elements, and extensive metadata. Understanding what contributes to file size is the first step toward effective compression.

The PDF specification allows for incredible flexibility, but this comes at a cost. Each element you add increases the file size, and without proper optimization, even simple documents can become surprisingly large.

Source Typical Impact Example Solution
High-resolution images 60-90% of file size A single 300 DPI photo can be 5-15 MB Downsample to 150 DPI for screen viewing
Embedded fonts 200 KB - 5 MB per font CJK fonts can exceed 10 MB each Use font subsetting to include only used glyphs
Uncompressed streams 2-5x larger than needed Text and vector data without Flate compression Apply stream compression during PDF creation
Duplicate resources Variable Same image embedded on every page Reference resources once, reuse across pages
Metadata and thumbnails 100 KB - 2 MB Page thumbnails, XMP metadata, edit history Strip unnecessary metadata and thumbnails
Incremental saves 10-50% overhead Each save appends changes instead of rewriting Linearize or rewrite the entire PDF structure

Use our PDF Info tool to analyze exactly what is consuming space in your file. This diagnostic step is crucial before applying compression, as it tells you where to focus your optimization efforts.

Pro tip: Images are almost always the primary culprit. If your PDF is over 5 MB, start by examining image resolution and compression settings before worrying about fonts or metadata.

Understanding Compression Methods

PDF compression isn't a single technique but rather a collection of strategies applied to different content types within the document. Each type of content—images, text, fonts, vector graphics—requires a different approach.

Image Downsampling

Downsampling is the most effective compression technique for image-heavy PDFs. It reduces image resolution by decreasing the number of pixels, which directly reduces file size. A 300 DPI image downsampled to 150 DPI becomes roughly one-quarter the pixel count.

There are three primary downsampling methods:

The resolution you choose depends entirely on the document's intended use. Screen viewing rarely requires more than 150 DPI, while professional printing typically needs 300 DPI or higher.

Image Recompression

After downsampling, you can further reduce size by recompressing images with more efficient codecs. Different image types benefit from different compression algorithms.

Format Type Best For Quality Notes Typical Compression Ratio
JPEG Lossy Photos, scanned documents Good at quality 75-85 10:1 to 20:1
JPEG2000 Lossy/Lossless High-quality photos Better than JPEG at same size 15:1 to 30:1
JBIG2 Lossy/Lossless Black & white text/scans 10-30x smaller than CCITT 50:1 to 100:1
Flate (ZIP) Lossless Screenshots, diagrams Perfect quality, moderate compression 2:1 to 4:1
CCITT Group 4 Lossless B&W fax-quality scans Perfect for 1-bit images 10:1 to 20:1

JPEG remains the most widely supported and effective format for color photographs. JPEG2000 offers better compression but has limited support in some PDF readers. For black-and-white documents, JBIG2 is remarkably efficient but requires specialized tools.

Lossy vs Lossless Compression

Understanding the difference between lossy and lossless compression is fundamental to making informed decisions about PDF optimization.

Lossless Compression

Lossless compression reduces file size without discarding any information. When you decompress the file, you get back exactly what you started with, bit for bit. This is essential for documents where accuracy matters.

Common lossless techniques include:

Lossless compression typically achieves 2:1 to 4:1 compression ratios for text and vector content. For images, the ratio depends heavily on image characteristics—screenshots compress well, photographs don't.

Lossy Compression

Lossy compression achieves much higher compression ratios by permanently discarding information that's less perceptible to human vision. Once applied, you cannot recover the original data.

The key is finding the sweet spot where file size decreases significantly but quality remains acceptable for your use case. A JPEG quality setting of 85 typically provides excellent visual quality while reducing file size by 80-90% compared to uncompressed.

Quick tip: Never apply lossy compression multiple times to the same image. Each compression pass degrades quality further. If you need to recompress, always start from the original uncompressed source if possible.

When to Use Each Type

Choose lossless compression when:

Choose lossy compression when:

Image Optimization Techniques

Since images typically account for 60-90% of PDF file size, optimizing them delivers the biggest impact. Here's a systematic approach to image optimization.

Resolution Guidelines

The appropriate resolution depends entirely on how the PDF will be used:

Most PDFs intended for screen viewing can safely use 150 DPI without any perceptible quality loss. This alone can reduce file size by 75% compared to 300 DPI images.

Color Space Optimization

Color images use significantly more data than grayscale or black-and-white. If your document doesn't require color, converting to grayscale can reduce image size by 60-70%.

For documents that are primarily text with occasional color elements, consider:

Our PDF to Images tool can help you extract and analyze individual pages to determine which ones actually need color.

JPEG Quality Settings

JPEG quality is typically specified on a scale from 0-100, though the exact meaning varies by implementation. Here's a practical guide:

For most business documents and presentations, a quality setting of 80-85 provides the best balance between file size and visual quality.

Font Subsetting and Embedding

Fonts can contribute significantly to PDF file size, especially when using multiple typefaces or non-Latin scripts. Understanding font embedding and subsetting is crucial for optimization.

How Font Embedding Works

When you create a PDF, you have three options for handling fonts:

A full font file contains thousands of glyphs covering multiple languages and special characters. If your document uses only 50 characters, subsetting removes the unused glyphs. A 2 MB font might shrink to 30 KB after subsetting.

Font Subsetting Best Practices

Modern PDF creation tools automatically subset fonts by default, but you should verify this, especially when working with older software or converting from other formats.

Key considerations:

Pro tip: If you're creating PDFs programmatically, always enable font subsetting in your library's configuration. This single setting can reduce file size by several megabytes in text-heavy documents.

Standard Fonts

PDF defines 14 "standard fonts" that all PDF readers must support: Times, Helvetica, Courier (each in regular, bold, italic, and bold-italic), Symbol, and ZapfDingbats. Using these fonts eliminates the need for embedding entirely.

However, standard fonts have limitations:

Different use cases require different compression strategies. Here are proven configurations for common scenarios.

Email Attachments (Target: Under 10 MB)

Most email systems have attachment size limits between 10-25 MB. For documents intended for email:

Expected compression: 70-85% reduction from original size.

Web Publishing (Target: Fast Loading)

For PDFs hosted on websites, optimize for download speed:

Expected compression: 80-90% reduction from original size.

Archival Storage (Target: Quality Preservation)

For long-term archival, prioritize quality over file size:

Expected compression: 20-40% reduction from original size.

Professional Printing (Target: Print Quality)

For documents going to professional printers:

Expected compression: 10-30% reduction from original size.

Mobile Viewing (Target: Small File Size)

For documents primarily viewed on mobile devices:

Expected compression: 85-95% reduction from original size.

Use our Compress PDF tool to apply these settings automatically based on your selected use case.

Ghostscript Commands for Compression

Ghostscript is a powerful open-source tool for PDF manipulation and compression. It's available for Windows, macOS, and Linux, and provides fine-grained control over compression settings.

Basic Compression Command

The simplest Ghostscript compression command uses predefined settings:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
   -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

The -dPDFSETTINGS parameter accepts these presets:

Custom Compression Settings

For more control, specify individual parameters:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dDownsampleColorImages=true \
   -dColorImageResolution=150 \
   -dColorImageDownsampleType=/Bicubic \
   -dEncodeColorImages=true \
   -dColorImageFilter=/DCTEncode \
   -dJPEGQ=85 \
   -dDownsampleGrayImages=true \
   -dGrayImageResolution=150 \
   -dGrayImageDownsampleType=/Bicubic \
   -dEncodeGrayImages=true \
   -dGrayImageFilter=/DCTEncode \
   -dDownsampleMonoImages=true \
   -dMonoImageResolution=300 \
   -dMonoImageDownsampleType=/Bicubic \
   -dEncodeMonoImages=true \
   -dMonoImageFilter=/CCITTFaxEncode \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=output.pdf input.pdf

This command:

Font Subsetting with Ghostscript

To enable font subsetting:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dSubsetFonts=true \
   -dEmbedAllFonts=true \
   -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=output.pdf input.pdf

The -dSubsetFonts=true parameter ensures only used glyphs are embedded, while -dEmbedAllFonts=true ensures all fonts are embedded (as subsets).

Quick tip: Always test Ghostscript commands on a copy of your PDF first. Some settings can cause unexpected rendering issues with complex documents.

Batch Processing Multiple Files

To compress multiple PDFs in a directory (Linux/macOS):

for file in *.pdf; do
  gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
     -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
     -sOutputFile="compressed_${file}" "${file}"
done

For Windows PowerShell:

Get-ChildItem *.pdf | ForEach-Object {
  gswin64c -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 `
    -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH `
    -sOutputFile="compressed_$($_.Name)" $_.Name
}

Python Libraries and Automation

For developers and power users, Python offers several libraries for PDF compression and manipulation. These are ideal for automating compression workflows or integrating PDF optimization into larger applications.

PyPDF2 and pikepdf

PyPDF2 is a pure-Python library for basic PDF operations, while pikepdf provides more advanced features with better performance:

import pikepdf

# Open and save with compression
with pikepdf.open('input.pdf') as pdf:
    pdf.save('output.pdf', compress_streams=True)

This applies lossless stream compression but doesn't handle image recompression. For that, you need additional tools.

img2pdf for Image-to-PDF Conversion

When creating PDFs from images, img2pdf produces smaller files than most alternatives:

import img2pdf

with open('output.pdf', 'wb') as f:
    f.write(img2pdf.convert(['image1.jpg', 'image2.jpg']))

It embeds images without recompression, preserving their existing JPEG compression.

Pillow for Image Preprocessing

Before creating a PDF, optimize images with Pillow:

from PIL import Image

img = Image.open('input.jpg')
# Resize to 150 DPI equivalent (assuming original is 300 DPI)
img = img.resize((img.width // 2, img.height // 2), Image.BICUBIC)
# Save with JPEG quality 85
img.save('output.jpg', 'JPEG', quality=85, optimize=True)

Calling Ghostscript from Python

For maximum control, call Ghostscript directly from Python:

import subprocess

def compress_pdf(input_path, output_path, quality='ebook'):
    subprocess.run([
        'gs',
        '-sDEVICE=pdfwrite',
        '-dCompatibilityLevel=1.4',
        f'-dPDFSETTINGS=/{quality}',
        '-dNOPAUSE',
        '-dQUIET',
        '-dBATCH',
        f'-sOutputFile={output_path}',
        input_path
    ], check=True)

compress_pdf('input.pdf', 'output.pdf', 'ebook')

Complete Compression Script

Here's a complete Python script that compresses a PDF with custom settings:

import pikepdf
from PIL import Image
import io

def compress_pdf(input_path, output_path, image_quality=85, max_dpi=150):
    pdf = pikepdf.open(input_path)
    
    for page in pdf.pages:
        for image_key in page.images.keys():
            raw_image = page.images[image_key]
            pil_image = raw_image.as_pil_image()
            
            # Calculate new size based on DPI
            dpi = raw_image.image_data.get('/DPI', (72, 72))
            if dpi[0] > max_dpi:
                scale = max_dpi / dpi[0]
                new_size = (int(pil_image.width * scale), 
                           int(pil_image.height * scale))
                pil_image = pil_image.resize(new_size, Image.BICUBIC)
            
            # Compress as JPEG
            img_byte_arr = io.BytesIO()
            pil_image.save(img_byte_arr, format='JPEG', 
                          quality=image_quality, optimize=True)
            
            # Replace image in PDF
            raw_image.write(img_byte_arr.getvalue(), 
                          filter=pikepdf.Name.DCTDecode)
    
    pdf.save(output_path, compress_streams=True)
    pdf.close()

compress_pdf('input.pdf', 'output.pdf')

This script opens a PDF, iterates through all images, downsamples them to 150 DPI, recompresses them as JPEG at quality 85, and saves the result with stream compression enabled.

Compression Comparison and Benchmarks

Understanding the trade-offs between different compression settings helps you make informed decisions. Here are real-world benchmarks from compressing various document types.

Sample Document Compression Results

Document Type Original Size Screen (72 DPI) Ebook (150 DPI) Printer (300 DPI) Quality Impact
Photo-heavy brochure 45 MB 3.2 MB (93% reduction) 8.5 MB (81% reduction) 22 MB (51% reduction) Screen: noticeable, Ebook: minimal, Printer: none
Scanned text document 28 MB 2.1 MB (92% reduction) 4.8 MB (83% reduction) 12 MB (57% reduction) Screen: acceptable, Ebook: good, Printer: excellent
Technical manual with diagrams 18 MB 2.8 MB (84% reduction) 5.2 MB (71% reduction) 9.5 MB (47% reduction) Screen: good, Ebook: excellent, Printer: excellent
Presentation slides 35 MB 4.1 MB (88% reduction) 7.8 MB (78% reduction) 16 MB (54% reduction) Screen: excellent, Ebook: excellent, Printer: good
Form with minimal images 5 MB 0.8 MB (84% reduction) 1.2 MB (76% reduction) 2.1 MB (58% reduction)