PDF Compression: How to Reduce File Size Without Losing Quality
· 12 min read
PDF files have a reputation for ballooning to unwieldy sizes, especially when they contain high-resolution images, embedded fonts, or complex graphics. Whether you're trying to email a document, upload it to a web portal with size restrictions, or simply save storage space, understanding how to compress PDFs effectively is essential.
This comprehensive guide walks you through the technical details of PDF compression, from understanding what makes PDFs large to implementing practical compression strategies that preserve quality. You'll learn about different compression algorithms, command-line tools, and when to use lossy versus lossless techniques.
Table of Contents
- Why PDFs Get Large
- Understanding Compression Methods
- Lossy vs Lossless Compression
- Image Optimization Techniques
- Font Subsetting and Embedding
- Recommended Settings by Use Case
- Ghostscript Commands for Compression
- Python Libraries and Automation
- Compression Comparison and Benchmarks
- Practical Tips and Best Practices
- Frequently Asked Questions
- Related Articles
Why PDFs Get Large
A PDF is fundamentally a container format that can hold multiple types of content: text, images, fonts, vector graphics, JavaScript, multimedia elements, and extensive metadata. Understanding what contributes to file size is the first step toward effective compression.
The PDF specification allows for incredible flexibility, but this comes at a cost. Each element you add increases the file size, and without proper optimization, even simple documents can become surprisingly large.
| Source | Typical Impact | Example | Solution |
|---|---|---|---|
| High-resolution images | 60-90% of file size | A single 300 DPI photo can be 5-15 MB | Downsample to 150 DPI for screen viewing |
| Embedded fonts | 200 KB - 5 MB per font | CJK fonts can exceed 10 MB each | Use font subsetting to include only used glyphs |
| Uncompressed streams | 2-5x larger than needed | Text and vector data without Flate compression | Apply stream compression during PDF creation |
| Duplicate resources | Variable | Same image embedded on every page | Reference resources once, reuse across pages |
| Metadata and thumbnails | 100 KB - 2 MB | Page thumbnails, XMP metadata, edit history | Strip unnecessary metadata and thumbnails |
| Incremental saves | 10-50% overhead | Each save appends changes instead of rewriting | Linearize or rewrite the entire PDF structure |
Use our PDF Info tool to analyze exactly what is consuming space in your file. This diagnostic step is crucial before applying compression, as it tells you where to focus your optimization efforts.
Pro tip: Images are almost always the primary culprit. If your PDF is over 5 MB, start by examining image resolution and compression settings before worrying about fonts or metadata.
Understanding Compression Methods
PDF compression isn't a single technique but rather a collection of strategies applied to different content types within the document. Each type of content—images, text, fonts, vector graphics—requires a different approach.
Image Downsampling
Downsampling is the most effective compression technique for image-heavy PDFs. It reduces image resolution by decreasing the number of pixels, which directly reduces file size. A 300 DPI image downsampled to 150 DPI becomes roughly one-quarter the pixel count.
There are three primary downsampling methods:
- Bicubic downsampling — Provides the best quality by averaging pixel neighborhoods using a cubic function. This method produces smooth gradients and is ideal for photographs and complex images.
- Average downsampling — Faster than bicubic, averages pixels in a simpler way. Quality is slightly lower but still acceptable for most use cases.
- Subsampling — The fastest method, simply picks the nearest pixel without averaging. Can produce blocky artifacts and should only be used when speed is critical and quality is secondary.
The resolution you choose depends entirely on the document's intended use. Screen viewing rarely requires more than 150 DPI, while professional printing typically needs 300 DPI or higher.
Image Recompression
After downsampling, you can further reduce size by recompressing images with more efficient codecs. Different image types benefit from different compression algorithms.
| Format | Type | Best For | Quality Notes | Typical Compression Ratio |
|---|---|---|---|---|
| JPEG | Lossy | Photos, scanned documents | Good at quality 75-85 | 10:1 to 20:1 |
| JPEG2000 | Lossy/Lossless | High-quality photos | Better than JPEG at same size | 15:1 to 30:1 |
| JBIG2 | Lossy/Lossless | Black & white text/scans | 10-30x smaller than CCITT | 50:1 to 100:1 |
| Flate (ZIP) | Lossless | Screenshots, diagrams | Perfect quality, moderate compression | 2:1 to 4:1 |
| CCITT Group 4 | Lossless | B&W fax-quality scans | Perfect for 1-bit images | 10:1 to 20:1 |
JPEG remains the most widely supported and effective format for color photographs. JPEG2000 offers better compression but has limited support in some PDF readers. For black-and-white documents, JBIG2 is remarkably efficient but requires specialized tools.
Lossy vs Lossless Compression
Understanding the difference between lossy and lossless compression is fundamental to making informed decisions about PDF optimization.
Lossless Compression
Lossless compression reduces file size without discarding any information. When you decompress the file, you get back exactly what you started with, bit for bit. This is essential for documents where accuracy matters.
Common lossless techniques include:
- Flate/Deflate compression — The ZIP algorithm, applied to text streams and vector graphics
- LZW compression — An older algorithm, less efficient than Flate but still used in some PDFs
- Run-length encoding — Efficient for images with large areas of solid color
- CCITT Group 4 — Specifically designed for black-and-white fax images
Lossless compression typically achieves 2:1 to 4:1 compression ratios for text and vector content. For images, the ratio depends heavily on image characteristics—screenshots compress well, photographs don't.
Lossy Compression
Lossy compression achieves much higher compression ratios by permanently discarding information that's less perceptible to human vision. Once applied, you cannot recover the original data.
The key is finding the sweet spot where file size decreases significantly but quality remains acceptable for your use case. A JPEG quality setting of 85 typically provides excellent visual quality while reducing file size by 80-90% compared to uncompressed.
Quick tip: Never apply lossy compression multiple times to the same image. Each compression pass degrades quality further. If you need to recompress, always start from the original uncompressed source if possible.
When to Use Each Type
Choose lossless compression when:
- The document contains legal, medical, or financial information requiring perfect accuracy
- Text must remain crisp and readable at any zoom level
- The PDF will be edited or processed further
- You're working with line art, diagrams, or screenshots with text
Choose lossy compression when:
- The document is primarily photographs or scanned images
- File size is more important than perfect visual fidelity
- The document is for screen viewing only, not professional printing
- You need to meet strict file size limits (email attachments, web uploads)
Image Optimization Techniques
Since images typically account for 60-90% of PDF file size, optimizing them delivers the biggest impact. Here's a systematic approach to image optimization.
Resolution Guidelines
The appropriate resolution depends entirely on how the PDF will be used:
- 72-96 DPI — Web viewing, email attachments, mobile devices
- 150 DPI — General screen viewing, presentations, internal documents
- 300 DPI — Professional printing, high-quality output
- 600+ DPI — Fine art reproduction, medical imaging, archival purposes
Most PDFs intended for screen viewing can safely use 150 DPI without any perceptible quality loss. This alone can reduce file size by 75% compared to 300 DPI images.
Color Space Optimization
Color images use significantly more data than grayscale or black-and-white. If your document doesn't require color, converting to grayscale can reduce image size by 60-70%.
For documents that are primarily text with occasional color elements, consider:
- Converting text pages to black-and-white (1-bit)
- Keeping only essential pages in color
- Using grayscale instead of color where possible
Our PDF to Images tool can help you extract and analyze individual pages to determine which ones actually need color.
JPEG Quality Settings
JPEG quality is typically specified on a scale from 0-100, though the exact meaning varies by implementation. Here's a practical guide:
- 90-100 — Minimal compression, very large files, indistinguishable from original
- 85-89 — Excellent quality, good compression, recommended for most uses
- 75-84 — Good quality, significant compression, suitable for web and screen viewing
- 60-74 — Acceptable quality, high compression, minor artifacts may be visible
- Below 60 — Poor quality, obvious artifacts, only for thumbnails or previews
For most business documents and presentations, a quality setting of 80-85 provides the best balance between file size and visual quality.
Font Subsetting and Embedding
Fonts can contribute significantly to PDF file size, especially when using multiple typefaces or non-Latin scripts. Understanding font embedding and subsetting is crucial for optimization.
How Font Embedding Works
When you create a PDF, you have three options for handling fonts:
- Embed full fonts — Include the entire font file, ensuring perfect rendering but increasing file size
- Embed subset fonts — Include only the glyphs (characters) actually used in the document
- Don't embed fonts — Rely on the viewer's system fonts, smallest file size but inconsistent rendering
A full font file contains thousands of glyphs covering multiple languages and special characters. If your document uses only 50 characters, subsetting removes the unused glyphs. A 2 MB font might shrink to 30 KB after subsetting.
Font Subsetting Best Practices
Modern PDF creation tools automatically subset fonts by default, but you should verify this, especially when working with older software or converting from other formats.
Key considerations:
- Always subset fonts unless you have a specific reason not to (like allowing form field text entry)
- CJK (Chinese, Japanese, Korean) fonts are particularly large—subsetting is essential
- If multiple pages use the same font, the subset is shared across all pages
- Subsetting prevents text editing in most PDF editors, which may be desirable for final documents
Pro tip: If you're creating PDFs programmatically, always enable font subsetting in your library's configuration. This single setting can reduce file size by several megabytes in text-heavy documents.
Standard Fonts
PDF defines 14 "standard fonts" that all PDF readers must support: Times, Helvetica, Courier (each in regular, bold, italic, and bold-italic), Symbol, and ZapfDingbats. Using these fonts eliminates the need for embedding entirely.
However, standard fonts have limitations:
- Limited to basic Latin characters
- Rendering varies slightly between PDF viewers
- No support for advanced typography features
- Not suitable for branded documents requiring specific typefaces
Recommended Settings by Use Case
Different use cases require different compression strategies. Here are proven configurations for common scenarios.
Email Attachments (Target: Under 10 MB)
Most email systems have attachment size limits between 10-25 MB. For documents intended for email:
- Downsample images to 150 DPI
- Use JPEG compression at quality 80
- Enable font subsetting
- Remove metadata and thumbnails
- Convert color pages to grayscale where appropriate
Expected compression: 70-85% reduction from original size.
Web Publishing (Target: Fast Loading)
For PDFs hosted on websites, optimize for download speed:
- Downsample images to 96-150 DPI
- Use JPEG compression at quality 75-80
- Enable linearization (fast web view)
- Subset all fonts
- Remove unnecessary metadata
Expected compression: 80-90% reduction from original size.
Archival Storage (Target: Quality Preservation)
For long-term archival, prioritize quality over file size:
- Keep images at 300 DPI or original resolution
- Use lossless compression (Flate) for images when possible
- If using JPEG, set quality to 90 or higher
- Embed full fonts to ensure future compatibility
- Preserve all metadata
Expected compression: 20-40% reduction from original size.
Professional Printing (Target: Print Quality)
For documents going to professional printers:
- Maintain 300 DPI for images
- Use CMYK color space
- Embed all fonts (full, not subset)
- Use lossless compression or high-quality JPEG (95+)
- Include crop marks and bleed if required
Expected compression: 10-30% reduction from original size.
Mobile Viewing (Target: Small File Size)
For documents primarily viewed on mobile devices:
- Downsample images to 96-120 DPI
- Use aggressive JPEG compression (quality 70-75)
- Convert to grayscale if color isn't essential
- Subset fonts aggressively
- Remove all non-essential metadata
Expected compression: 85-95% reduction from original size.
Use our Compress PDF tool to apply these settings automatically based on your selected use case.
Ghostscript Commands for Compression
Ghostscript is a powerful open-source tool for PDF manipulation and compression. It's available for Windows, macOS, and Linux, and provides fine-grained control over compression settings.
Basic Compression Command
The simplest Ghostscript compression command uses predefined settings:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
The -dPDFSETTINGS parameter accepts these presets:
/screen— Lowest quality, smallest file size (72 DPI images)/ebook— Medium quality, moderate file size (150 DPI images)/printer— High quality, larger file size (300 DPI images)/prepress— Highest quality, largest file size (300 DPI, color preservation)/default— Balanced settings, good starting point
Custom Compression Settings
For more control, specify individual parameters:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dDownsampleColorImages=true \
-dColorImageResolution=150 \
-dColorImageDownsampleType=/Bicubic \
-dEncodeColorImages=true \
-dColorImageFilter=/DCTEncode \
-dJPEGQ=85 \
-dDownsampleGrayImages=true \
-dGrayImageResolution=150 \
-dGrayImageDownsampleType=/Bicubic \
-dEncodeGrayImages=true \
-dGrayImageFilter=/DCTEncode \
-dDownsampleMonoImages=true \
-dMonoImageResolution=300 \
-dMonoImageDownsampleType=/Bicubic \
-dEncodeMonoImages=true \
-dMonoImageFilter=/CCITTFaxEncode \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf input.pdf
This command:
- Downsamples color and grayscale images to 150 DPI using bicubic interpolation
- Compresses color and grayscale images with JPEG at quality 85
- Downsamples monochrome images to 300 DPI
- Compresses monochrome images with CCITT Group 4
Font Subsetting with Ghostscript
To enable font subsetting:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dSubsetFonts=true \
-dEmbedAllFonts=true \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf input.pdf
The -dSubsetFonts=true parameter ensures only used glyphs are embedded, while -dEmbedAllFonts=true ensures all fonts are embedded (as subsets).
Quick tip: Always test Ghostscript commands on a copy of your PDF first. Some settings can cause unexpected rendering issues with complex documents.
Batch Processing Multiple Files
To compress multiple PDFs in a directory (Linux/macOS):
for file in *.pdf; do
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile="compressed_${file}" "${file}"
done
For Windows PowerShell:
Get-ChildItem *.pdf | ForEach-Object {
gswin64c -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 `
-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH `
-sOutputFile="compressed_$($_.Name)" $_.Name
}
Python Libraries and Automation
For developers and power users, Python offers several libraries for PDF compression and manipulation. These are ideal for automating compression workflows or integrating PDF optimization into larger applications.
PyPDF2 and pikepdf
PyPDF2 is a pure-Python library for basic PDF operations, while pikepdf provides more advanced features with better performance:
import pikepdf
# Open and save with compression
with pikepdf.open('input.pdf') as pdf:
pdf.save('output.pdf', compress_streams=True)
This applies lossless stream compression but doesn't handle image recompression. For that, you need additional tools.
img2pdf for Image-to-PDF Conversion
When creating PDFs from images, img2pdf produces smaller files than most alternatives:
import img2pdf
with open('output.pdf', 'wb') as f:
f.write(img2pdf.convert(['image1.jpg', 'image2.jpg']))
It embeds images without recompression, preserving their existing JPEG compression.
Pillow for Image Preprocessing
Before creating a PDF, optimize images with Pillow:
from PIL import Image
img = Image.open('input.jpg')
# Resize to 150 DPI equivalent (assuming original is 300 DPI)
img = img.resize((img.width // 2, img.height // 2), Image.BICUBIC)
# Save with JPEG quality 85
img.save('output.jpg', 'JPEG', quality=85, optimize=True)
Calling Ghostscript from Python
For maximum control, call Ghostscript directly from Python:
import subprocess
def compress_pdf(input_path, output_path, quality='ebook'):
subprocess.run([
'gs',
'-sDEVICE=pdfwrite',
'-dCompatibilityLevel=1.4',
f'-dPDFSETTINGS=/{quality}',
'-dNOPAUSE',
'-dQUIET',
'-dBATCH',
f'-sOutputFile={output_path}',
input_path
], check=True)
compress_pdf('input.pdf', 'output.pdf', 'ebook')
Complete Compression Script
Here's a complete Python script that compresses a PDF with custom settings:
import pikepdf
from PIL import Image
import io
def compress_pdf(input_path, output_path, image_quality=85, max_dpi=150):
pdf = pikepdf.open(input_path)
for page in pdf.pages:
for image_key in page.images.keys():
raw_image = page.images[image_key]
pil_image = raw_image.as_pil_image()
# Calculate new size based on DPI
dpi = raw_image.image_data.get('/DPI', (72, 72))
if dpi[0] > max_dpi:
scale = max_dpi / dpi[0]
new_size = (int(pil_image.width * scale),
int(pil_image.height * scale))
pil_image = pil_image.resize(new_size, Image.BICUBIC)
# Compress as JPEG
img_byte_arr = io.BytesIO()
pil_image.save(img_byte_arr, format='JPEG',
quality=image_quality, optimize=True)
# Replace image in PDF
raw_image.write(img_byte_arr.getvalue(),
filter=pikepdf.Name.DCTDecode)
pdf.save(output_path, compress_streams=True)
pdf.close()
compress_pdf('input.pdf', 'output.pdf')
This script opens a PDF, iterates through all images, downsamples them to 150 DPI, recompresses them as JPEG at quality 85, and saves the result with stream compression enabled.
Compression Comparison and Benchmarks
Understanding the trade-offs between different compression settings helps you make informed decisions. Here are real-world benchmarks from compressing various document types.
Sample Document Compression Results
| Document Type | Original Size | Screen (72 DPI) | Ebook (150 DPI) | Printer (300 DPI) | Quality Impact |
|---|---|---|---|---|---|
| Photo-heavy brochure | 45 MB | 3.2 MB (93% reduction) | 8.5 MB (81% reduction) | 22 MB (51% reduction) | Screen: noticeable, Ebook: minimal, Printer: none |
| Scanned text document | 28 MB | 2.1 MB (92% reduction) | 4.8 MB (83% reduction) | 12 MB (57% reduction) | Screen: acceptable, Ebook: good, Printer: excellent |
| Technical manual with diagrams | 18 MB | 2.8 MB (84% reduction) | 5.2 MB (71% reduction) | 9.5 MB (47% reduction) | Screen: good, Ebook: excellent, Printer: excellent |
| Presentation slides | 35 MB | 4.1 MB (88% reduction) | 7.8 MB (78% reduction) | 16 MB (54% reduction) | Screen: excellent, Ebook: excellent, Printer: good |
| Form with minimal images | 5 MB | 0.8 MB (84% reduction) | 1.2 MB (76% reduction) | 2.1 MB (58% reduction) |