Does compressing a PDF reduce quality?

It depends on the method. Lossless compression (removing metadata, optimizing streams) preserves quality. Lossy compression (downsampling images) reduces quality but achieves smaller sizes.

What is a good PDF file size for email?

Under 5MB for most email providers. Under 10MB is acceptable. Use screen-quality compression (72-150 DPI images) for email attachments.

Why is my PDF so large?

Common causes: high-resolution images, embedded fonts (especially CJK), uncompressed content streams, duplicate resources, and embedded multimedia.

Can I compress a PDF without software?

Yes. Online tools like ThePDF's compressor work in your browser without installing anything. The file never leaves your device.

What is the difference between PDF optimization and compression?

Compression reduces data size using algorithms. Optimization is broader: it includes compression plus removing unused objects, deduplicating resources, and linearizing for web viewing.

PDF Compression: How to Reduce File Size Without Losing Quality

March 31, 2026 · 12 min read

PDF files have a reputation for ballooning to unwieldy sizes, especially when they contain high-resolution images, embedded fonts, or complex graphics. Whether you're trying to email a document, upload it to a web portal with size restrictions, or simply save storage space, understanding how to compress PDFs effectively is essential.

This comprehensive guide walks you through the technical details of PDF compression, from understanding what makes PDFs large to implementing practical compression strategies that preserve quality. You'll learn about different compression algorithms, command-line tools, and when to use lossy versus lossless techniques.

Table of Contents

Why PDFs Get Large
Understanding Compression Methods
Lossy vs Lossless Compression
Image Optimization Techniques
Font Subsetting and Embedding
Recommended Settings by Use Case
Ghostscript Commands for Compression
Python Libraries and Automation
Compression Comparison and Benchmarks
Practical Tips and Best Practices
Frequently Asked Questions
Related Articles

Why PDFs Get Large

A PDF is fundamentally a container format that can hold multiple types of content: text, images, fonts, vector graphics, JavaScript, multimedia elements, and extensive metadata. Understanding what contributes to file size is the first step toward effective compression.

The PDF specification allows for incredible flexibility, but this comes at a cost. Each element you add increases the file size, and without proper optimization, even simple documents can become surprisingly large.

Source	Typical Impact	Example	Solution
High-resolution images	60-90% of file size	A single 300 DPI photo can be 5-15 MB	Downsample to 150 DPI for screen viewing
Embedded fonts	200 KB - 5 MB per font	CJK fonts can exceed 10 MB each	Use font subsetting to include only used glyphs
Uncompressed streams	2-5x larger than needed	Text and vector data without Flate compression	Apply stream compression during PDF creation
Duplicate resources	Variable	Same image embedded on every page	Reference resources once, reuse across pages
Metadata and thumbnails	100 KB - 2 MB	Page thumbnails, XMP metadata, edit history	Strip unnecessary metadata and thumbnails
Incremental saves	10-50% overhead	Each save appends changes instead of rewriting	Linearize or rewrite the entire PDF structure

Use our PDF Info tool to analyze exactly what is consuming space in your file. This diagnostic step is crucial before applying compression, as it tells you where to focus your optimization efforts.

Pro tip: Images are almost always the primary culprit. If your PDF is over 5 MB, start by examining image resolution and compression settings before worrying about fonts or metadata.

Understanding Compression Methods

PDF compression isn't a single technique but rather a collection of strategies applied to different content types within the document. Each type of content—images, text, fonts, vector graphics—requires a different approach.

Image Downsampling

Downsampling is the most effective compression technique for image-heavy PDFs. It reduces image resolution by decreasing the number of pixels, which directly reduces file size. A 300 DPI image downsampled to 150 DPI becomes roughly one-quarter the pixel count.

There are three primary downsampling methods:

Bicubic downsampling — Provides the best quality by averaging pixel neighborhoods using a cubic function. This method produces smooth gradients and is ideal for photographs and complex images.
Average downsampling — Faster than bicubic, averages pixels in a simpler way. Quality is slightly lower but still acceptable for most use cases.
Subsampling — The fastest method, simply picks the nearest pixel without averaging. Can produce blocky artifacts and should only be used when speed is critical and quality is secondary.

The resolution you choose depends entirely on the document's intended use. Screen viewing rarely requires more than 150 DPI, while professional printing typically needs 300 DPI or higher.

Image Recompression

After downsampling, you can further reduce size by recompressing images with more efficient codecs. Different image types benefit from different compression algorithms.

Format	Type	Best For	Quality Notes	Typical Compression Ratio
JPEG	Lossy	Photos, scanned documents	Good at quality 75-85	10:1 to 20:1
JPEG2000	Lossy/Lossless	High-quality photos	Better than JPEG at same size	15:1 to 30:1
JBIG2	Lossy/Lossless	Black & white text/scans	10-30x smaller than CCITT	50:1 to 100:1
Flate (ZIP)	Lossless	Screenshots, diagrams	Perfect quality, moderate compression	2:1 to 4:1
CCITT Group 4	Lossless	B&W fax-quality scans	Perfect for 1-bit images	10:1 to 20:1

JPEG remains the most widely supported and effective format for color photographs. JPEG2000 offers better compression but has limited support in some PDF readers. For black-and-white documents, JBIG2 is remarkably efficient but requires specialized tools.

Lossy vs Lossless Compression

Understanding the difference between lossy and lossless compression is fundamental to making informed decisions about PDF optimization.

Lossless Compression

Lossless compression reduces file size without discarding any information. When you decompress the file, you get back exactly what you started with, bit for bit. This is essential for documents where accuracy matters.

Common lossless techniques include:

Flate/Deflate compression — The ZIP algorithm, applied to text streams and vector graphics
LZW compression — An older algorithm, less efficient than Flate but still used in some PDFs
Run-length encoding — Efficient for images with large areas of solid color
CCITT Group 4 — Specifically designed for black-and-white fax images

Lossless compression typically achieves 2:1 to 4:1 compression ratios for text and vector content. For images, the ratio depends heavily on image characteristics—screenshots compress well, photographs don't.

Lossy Compression

Lossy compression achieves much higher compression ratios by permanently discarding information that's less perceptible to human vision. Once applied, you cannot recover the original data.

The key is finding the sweet spot where file size decreases significantly but quality remains acceptable for your use case. A JPEG quality setting of 85 typically provides excellent visual quality while reducing file size by 80-90% compared to uncompressed.

Quick tip: Never apply lossy compression multiple times to the same image. Each compression pass degrades quality further. If you need to recompress, always start from the original uncompressed source if possible.

When to Use Each Type

Choose lossless compression when:

The document contains legal, medical, or financial information requiring perfect accuracy
Text must remain crisp and readable at any zoom level
The PDF will be edited or processed further
You're working with line art, diagrams, or screenshots with text

Choose lossy compression when:

The document is primarily photographs or scanned images
File size is more important than perfect visual fidelity
The document is for screen viewing only, not professional printing
You need to meet strict file size limits (email attachments, web uploads)

Image Optimization Techniques

Since images typically account for 60-90% of PDF file size, optimizing them delivers the biggest impact. Here's a systematic approach to image optimization.

Resolution Guidelines

The appropriate resolution depends entirely on how the PDF will be used:

72-96 DPI — Web viewing, email attachments, mobile devices
150 DPI — General screen viewing, presentations, internal documents
300 DPI — Professional printing, high-quality output
600+ DPI — Fine art reproduction, medical imaging, archival purposes

Most PDFs intended for screen viewing can safely use 150 DPI without any perceptible quality loss. This alone can reduce file size by 75% compared to 300 DPI images.

Color Space Optimization

Color images use significantly more data than grayscale or black-and-white. If your document doesn't require color, converting to grayscale can reduce image size by 60-70%.

For documents that are primarily text with occasional color elements, consider:

Converting text pages to black-and-white (1-bit)
Keeping only essential pages in color
Using grayscale instead of color where possible

Our PDF to Images tool can help you extract and analyze individual pages to determine which ones actually need color.

JPEG Quality Settings

JPEG quality is typically specified on a scale from 0-100, though the exact meaning varies by implementation. Here's a practical guide:

90-100 — Minimal compression, very large files, indistinguishable from original
85-89 — Excellent quality, good compression, recommended for most uses
75-84 — Good quality, significant compression, suitable for web and screen viewing
60-74 — Acceptable quality, high compression, minor artifacts may be visible
Below 60 — Poor quality, obvious artifacts, only for thumbnails or previews

For most business documents and presentations, a quality setting of 80-85 provides the best balance between file size and visual quality.

Font Subsetting and Embedding

Fonts can contribute significantly to PDF file size, especially when using multiple typefaces or non-Latin scripts. Understanding font embedding and subsetting is crucial for optimization.

How Font Embedding Works

When you create a PDF, you have three options for handling fonts:

Embed full fonts — Include the entire font file, ensuring perfect rendering but increasing file size
Embed subset fonts — Include only the glyphs (characters) actually used in the document
Don't embed fonts — Rely on the viewer's system fonts, smallest file size but inconsistent rendering

A full font file contains thousands of glyphs covering multiple languages and special characters. If your document uses only 50 characters, subsetting removes the unused glyphs. A 2 MB font might shrink to 30 KB after subsetting.

Font Subsetting Best Practices

Modern PDF creation tools automatically subset fonts by default, but you should verify this, especially when working with older software or converting from other formats.

Key considerations:

Always subset fonts unless you have a specific reason not to (like allowing form field text entry)
CJK (Chinese, Japanese, Korean) fonts are particularly large—subsetting is essential
If multiple pages use the same font, the subset is shared across all pages
Subsetting prevents text editing in most PDF editors, which may be desirable for final documents

Pro tip: If you're creating PDFs programmatically, always enable font subsetting in your library's configuration. This single setting can reduce file size by several megabytes in text-heavy documents.

Standard Fonts

PDF defines 14 "standard fonts" that all PDF readers must support: Times, Helvetica, Courier (each in regular, bold, italic, and bold-italic), Symbol, and ZapfDingbats. Using these fonts eliminates the need for embedding entirely.

However, standard fonts have limitations:

Limited to basic Latin characters
Rendering varies slightly between PDF viewers
No support for advanced typography features
Not suitable for branded documents requiring specific typefaces

Recommended Settings by Use Case

Different use cases require different compression strategies. Here are proven configurations for common scenarios.

Email Attachments (Target: Under 10 MB)

Most email systems have attachment size limits between 10-25 MB. For documents intended for email:

Downsample images to 150 DPI
Use JPEG compression at quality 80
Enable font subsetting
Remove metadata and thumbnails
Convert color pages to grayscale where appropriate

Expected compression: 70-85% reduction from original size.

Web Publishing (Target: Fast Loading)

For PDFs hosted on websites, optimize for download speed:

Downsample images to 96-150 DPI
Use JPEG compression at quality 75-80
Enable linearization (fast web view)
Subset all fonts
Remove unnecessary metadata

Expected compression: 80-90% reduction from original size.

Archival Storage (Target: Quality Preservation)

For long-term archival, prioritize quality over file size:

Keep images at 300 DPI or original resolution
Use lossless compression (Flate) for images when possible
If using JPEG, set quality to 90 or higher
Embed full fonts to ensure future compatibility
Preserve all metadata

Expected compression: 20-40% reduction from original size.

Professional Printing (Target: Print Quality)

For documents going to professional printers:

Maintain 300 DPI for images
Use CMYK color space
Embed all fonts (full, not subset)
Use lossless compression or high-quality JPEG (95+)
Include crop marks and bleed if required

Expected compression: 10-30% reduction from original size.

Mobile Viewing (Target: Small File Size)

For documents primarily viewed on mobile devices:

Downsample images to 96-120 DPI
Use aggressive JPEG compression (quality 70-75)
Convert to grayscale if color isn't essential
Subset fonts aggressively
Remove all non-essential metadata

Expected compression: 85-95% reduction from original size.

Use our Compress PDF tool to apply these settings automatically based on your selected use case.

Ghostscript Commands for Compression

Ghostscript is a powerful open-source tool for PDF manipulation and compression. It's available for Windows, macOS, and Linux, and provides fine-grained control over compression settings.

Basic Compression Command

The simplest Ghostscript compression command uses predefined settings:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
   -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

The -dPDFSETTINGS parameter accepts these presets:

/screen — Lowest quality, smallest file size (72 DPI images)
/ebook — Medium quality, moderate file size (150 DPI images)
/printer — High quality, larger file size (300 DPI images)
/prepress — Highest quality, largest file size (300 DPI, color preservation)
/default — Balanced settings, good starting point

Custom Compression Settings

For more control, specify individual parameters:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dDownsampleColorImages=true \
   -dColorImageResolution=150 \
   -dColorImageDownsampleType=/Bicubic \
   -dEncodeColorImages=true \
   -dColorImageFilter=/DCTEncode \
   -dJPEGQ=85 \
   -dDownsampleGrayImages=true \
   -dGrayImageResolution=150 \
   -dGrayImageDownsampleType=/Bicubic \
   -dEncodeGrayImages=true \
   -dGrayImageFilter=/DCTEncode \
   -dDownsampleMonoImages=true \
   -dMonoImageResolution=300 \
   -dMonoImageDownsampleType=/Bicubic \
   -dEncodeMonoImages=true \
   -dMonoImageFilter=/CCITTFaxEncode \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=output.pdf input.pdf

This command:

Downsamples color and grayscale images to 150 DPI using bicubic interpolation
Compresses color and grayscale images with JPEG at quality 85
Downsamples monochrome images to 300 DPI
Compresses monochrome images with CCITT Group 4

Font Subsetting with Ghostscript

To enable font subsetting:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dSubsetFonts=true \
   -dEmbedAllFonts=true \
   -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=output.pdf input.pdf

The -dSubsetFonts=true parameter ensures only used glyphs are embedded, while -dEmbedAllFonts=true ensures all fonts are embedded (as subsets).

Quick tip: Always test Ghostscript commands on a copy of your PDF first. Some settings can cause unexpected rendering issues with complex documents.

Batch Processing Multiple Files

To compress multiple PDFs in a directory (Linux/macOS):

for file in *.pdf; do
  gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
     -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
     -sOutputFile="compressed_${file}" "${file}"
done

For Windows PowerShell:

Get-ChildItem *.pdf | ForEach-Object {
  gswin64c -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 `
    -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH `
    -sOutputFile="compressed_$($_.Name)" $_.Name
}

Python Libraries and Automation

For developers and power users, Python offers several libraries for PDF compression and manipulation. These are ideal for automating compression workflows or integrating PDF optimization into larger applications.

PyPDF2 and pikepdf

PyPDF2 is a pure-Python library for basic PDF operations, while pikepdf provides more advanced features with better performance:

import pikepdf

# Open and save with compression
with pikepdf.open('input.pdf') as pdf:
    pdf.save('output.pdf', compress_streams=True)

This applies lossless stream compression but doesn't handle image recompression. For that, you need additional tools.

img2pdf for Image-to-PDF Conversion

When creating PDFs from images, img2pdf produces smaller files than most alternatives:

import img2pdf

with open('output.pdf', 'wb') as f:
    f.write(img2pdf.convert(['image1.jpg', 'image2.jpg']))

It embeds images without recompression, preserving their existing JPEG compression.

Pillow for Image Preprocessing

Before creating a PDF, optimize images with Pillow:

from PIL import Image

img = Image.open('input.jpg')
# Resize to 150 DPI equivalent (assuming original is 300 DPI)
img = img.resize((img.width // 2, img.height // 2), Image.BICUBIC)
# Save with JPEG quality 85
img.save('output.jpg', 'JPEG', quality=85, optimize=True)

Calling Ghostscript from Python

For maximum control, call Ghostscript directly from Python:

import subprocess

def compress_pdf(input_path, output_path, quality='ebook'):
    subprocess.run([
        'gs',
        '-sDEVICE=pdfwrite',
        '-dCompatibilityLevel=1.4',
        f'-dPDFSETTINGS=/{quality}',
        '-dNOPAUSE',
        '-dQUIET',
        '-dBATCH',
        f'-sOutputFile={output_path}',
        input_path
    ], check=True)

compress_pdf('input.pdf', 'output.pdf', 'ebook')

Complete Compression Script

Here's a complete Python script that compresses a PDF with custom settings:

import pikepdf
from PIL import Image
import io

def compress_pdf(input_path, output_path, image_quality=85, max_dpi=150):
    pdf = pikepdf.open(input_path)
    
    for page in pdf.pages:
        for image_key in page.images.keys():
            raw_image = page.images[image_key]
            pil_image = raw_image.as_pil_image()
            
            # Calculate new size based on DPI
            dpi = raw_image.image_data.get('/DPI', (72, 72))
            if dpi[0] > max_dpi:
                scale = max_dpi / dpi[0]
                new_size = (int(pil_image.width * scale), 
                           int(pil_image.height * scale))
                pil_image = pil_image.resize(new_size, Image.BICUBIC)
            
            # Compress as JPEG
            img_byte_arr = io.BytesIO()
            pil_image.save(img_byte_arr, format='JPEG', 
                          quality=image_quality, optimize=True)
            
            # Replace image in PDF
            raw_image.write(img_byte_arr.getvalue(), 
                          filter=pikepdf.Name.DCTDecode)
    
    pdf.save(output_path, compress_streams=True)
    pdf.close()

compress_pdf('input.pdf', 'output.pdf')

This script opens a PDF, iterates through all images, downsamples them to 150 DPI, recompresses them as JPEG at quality 85, and saves the result with stream compression enabled.

Compression Comparison and Benchmarks

Understanding the trade-offs between different compression settings helps you make informed decisions. Here are real-world benchmarks from compressing various document types.

Sample Document Compression Results

Document Type	Original Size	Screen (72 DPI)	Ebook (150 DPI)	Printer (300 DPI)	Quality Impact
Photo-heavy brochure	45 MB	3.2 MB (93% reduction)	8.5 MB (81% reduction)	22 MB (51% reduction)	Screen: noticeable, Ebook: minimal, Printer: none
Scanned text document	28 MB	2.1 MB (92% reduction)	4.8 MB (83% reduction)	12 MB (57% reduction)	Screen: acceptable, Ebook: good, Printer: excellent
Technical manual with diagrams	18 MB	2.8 MB (84% reduction)	5.2 MB (71% reduction)	9.5 MB (47% reduction)	Screen: good, Ebook: excellent, Printer: excellent
Presentation slides	35 MB	4.1 MB (88% reduction)	7.8 MB (78% reduction)	16 MB (54% reduction)	Screen: excellent, Ebook: excellent, Printer: good
Form with minimal images	5 MB	0.8 MB (84% reduction)	1.2 MB (76% reduction)	2.1 MB (58% reduction)