PDF Metadata: What It Is and How to Edit It
· 12 min read
Table of Contents
- What Is PDF Metadata?
- Types of PDF Metadata
- Why Metadata Matters
- How to View PDF Metadata
- How to Edit PDF Metadata
- Privacy and Security Concerns
- Metadata Standards and Schemas
- Using Metadata to Compare Documents
- Metadata in Professional Workflows
- Troubleshooting Common Metadata Issues
- Frequently Asked Questions
- Related Articles
What Is PDF Metadata?
Every PDF file carries hidden information that most users never see. This invisible layer of data—called metadata—describes the document itself rather than its visible content. Think of it as a detailed label on a package: it tells you who created it, when it was made, what software was used, and much more, all without opening the document to read its pages.
PDF metadata serves essential functions in document management, search, organization, and compliance. Libraries use metadata to catalog digital collections. Legal teams rely on metadata timestamps to establish document provenance. SEO specialists optimize PDF metadata to improve search engine rankings. Organizations use metadata standards to maintain consistent document properties across thousands of files.
Understanding metadata isn't just for power users—it's important for anyone who creates or shares PDFs. The metadata in your documents might reveal more about you and your workflow than you realize, and knowing how to control it gives you power over your digital privacy and professional image.
Metadata exists in two primary layers within a PDF file. The first is the Document Information Dictionary, a legacy format that's been part of PDF since version 1.0. The second is XMP (Extensible Metadata Platform), introduced in PDF 1.4, which uses XML to store more complex and extensible metadata. Modern PDFs typically contain both formats for backward compatibility.
Quick tip: You can view basic PDF metadata in most PDF readers by opening File > Properties or pressing Ctrl+D (Windows) or Cmd+D (Mac). This reveals the document's title, author, creation date, and other standard fields.
Types of PDF Metadata
Document Information Dictionary
The most basic form of PDF metadata, the Document Information Dictionary has been part of the PDF specification since its earliest versions. It stores standard properties that appear in virtually every PDF reader's document properties dialog.
The eight standard fields in the Document Information Dictionary are:
- Title: The document's title, which may differ from the filename
- Author: The person who created the document
- Subject: A brief description of the document's topic
- Keywords: Search terms relevant to the document content
- Creator: The application that created the original document (e.g., "Microsoft Word")
- Producer: The application that converted the document to PDF (e.g., "Adobe PDF Library 15.0")
- CreationDate: When the document was first created
- ModDate: When the document was last modified
These fields are simple text strings (except for dates, which use a specific format). While they're called "standard," they're all optional—a PDF can exist with none of these fields populated.
XMP Metadata
XMP (Extensible Metadata Platform) is Adobe's standard for embedding metadata in files. Introduced in 2001, XMP uses XML to store metadata in a structured, extensible format that can accommodate custom properties and complex relationships.
XMP metadata is organized into namespaces, each serving a specific purpose:
- Dublin Core (dc): Basic bibliographic information like title, creator, description, and subject
- XMP Basic (xmp): Fundamental properties including creation date, modification date, and creator tool
- XMP Rights Management (xmpRights): Copyright and usage rights information
- PDF Schema (pdf): PDF-specific properties like keywords, PDF version, and producer
- Photoshop Schema (photoshop): Image-specific metadata when PDFs contain photos
- EXIF: Camera and image capture data for photographs
- IPTC: Journalism and media industry metadata standards
XMP's XML structure allows for much richer metadata than the simple key-value pairs of the Document Information Dictionary. You can store arrays of values, nested structures, and custom properties specific to your organization or workflow.
Structural Metadata
Beyond descriptive metadata, PDFs contain structural metadata that defines how the document is organized:
- Page labels: Custom numbering schemes (Roman numerals for front matter, Arabic for body)
- Bookmarks: Navigation structure and outline hierarchy
- Document structure tags: Semantic markup for accessibility (headings, paragraphs, lists)
- Logical structure: Reading order and content relationships
- Attachments: Embedded files and their descriptions
This structural metadata is crucial for accessibility, navigation, and document understanding by assistive technologies.
Technical Metadata
PDFs also store technical information about the file itself:
- PDF version: Which version of the PDF specification the file conforms to
- Page dimensions: Size of each page in points
- Color space: RGB, CMYK, or other color models used
- Font information: Embedded fonts and their properties
- Compression methods: How images and content streams are compressed
- Encryption settings: Security restrictions and permissions
- Linearization: Whether the PDF is optimized for web viewing
This technical metadata is typically managed automatically by PDF creation software and isn't meant for manual editing.
| Metadata Type | Format | Primary Use | User Editable |
|---|---|---|---|
| Document Info Dictionary | Key-value pairs | Basic document properties | Yes |
| XMP Metadata | XML | Extended properties, rights management | Yes |
| Structural Metadata | PDF objects | Navigation, accessibility | Partially |
| Technical Metadata | PDF internal structures | File specifications, rendering | No |
Why Metadata Matters
Document Organization and Searchability
Proper metadata transforms a collection of files into a searchable, organized library. When you store hundreds or thousands of PDFs, filenames alone aren't enough to find what you need quickly.
Well-maintained metadata enables:
- Desktop search: Operating systems index PDF metadata, making documents findable through system search
- Document management systems: Enterprise systems rely on metadata for categorization and retrieval
- Digital asset management: Creative teams use metadata to track versions, rights, and usage
- Research databases: Academic institutions catalog papers using standardized metadata schemas
A PDF titled "Q4_Report_Final_v3_FINAL.pdf" tells you nothing. But metadata fields for Title ("Q4 2025 Financial Report"), Author ("Finance Department"), Subject ("Quarterly earnings and projections"), and Keywords ("revenue, expenses, forecast, 2025") make that document instantly discoverable.
SEO and Web Visibility
Search engines index PDF metadata when crawling websites. Google, Bing, and other search engines read the Title, Author, Subject, and Keywords fields to understand document content and relevance.
Optimizing PDF metadata for SEO involves:
- Writing descriptive, keyword-rich titles that match search intent
- Including relevant keywords in the Subject and Keywords fields
- Ensuring the Author field reflects your brand or organization
- Keeping metadata consistent with the document's actual content
A white paper with the title "Document1.pdf" and no metadata will rank poorly compared to one titled "Complete Guide to Cloud Security Best Practices 2026" with properly optimized metadata fields.
Legal and Compliance Requirements
In legal, financial, and regulated industries, metadata serves as evidence of document authenticity and chain of custody. Courts accept metadata as proof of when documents were created and modified.
Legal teams use metadata to:
- Establish document timelines in litigation
- Verify document authenticity and detect tampering
- Track document versions and revisions
- Comply with discovery requirements in legal proceedings
- Meet regulatory record-keeping standards
Financial institutions must maintain audit trails showing when documents were created, who created them, and what changes were made. Metadata provides this audit trail automatically.
Professional Presentation
Metadata affects how your documents appear to recipients. When someone opens your PDF, the title bar displays the Title field—not the filename. A professional title makes a better impression than "Untitled" or a cryptic filename.
Complete metadata signals professionalism and attention to detail. It shows you care about document quality beyond just the visible content.
Pro tip: Before sharing any PDF externally, review its metadata using our Metadata Editor tool. Remove any internal information, set a professional title, and ensure the author field reflects how you want to be identified.
How to View PDF Metadata
Using Adobe Acrobat Reader
Adobe Acrobat Reader, the most widely used PDF viewer, provides easy access to document metadata:
- Open your PDF in Acrobat Reader
- Go to File > Properties or press
Ctrl+D(Windows) orCmd+D(Mac) - The Document Properties dialog opens, showing the Description tab by default
- View Title, Author, Subject, and Keywords in the Description tab
- Click the Additional Metadata button for XMP metadata
- Switch to other tabs (Security, Fonts, Initial View) for additional information
The Additional Metadata dialog shows the complete XMP metadata in a tree structure, organized by namespace. You can expand each namespace to see all properties and their values.
Using Other PDF Readers
Most PDF readers provide similar functionality, though the exact menu location varies:
- Foxit Reader: File > Properties or Ctrl+D
- PDF-XChange Editor: File > Document Properties
- Sumatra PDF: File > Properties
- Preview (Mac): Tools > Show Inspector, then click the Info tab
- Evince (Linux): File > Properties
Browser-based PDF viewers (Chrome, Firefox, Edge) typically show limited metadata or none at all. For complete metadata access, use a dedicated PDF application.
Using Command-Line Tools
For batch processing or automation, command-line tools extract metadata efficiently:
ExifTool (cross-platform):
exiftool document.pdf
This displays all metadata fields in a readable format. To extract specific fields:
exiftool -Title -Author -Subject document.pdf
pdfinfo (part of Poppler utilities on Linux/Mac):
pdfinfo document.pdf
pdftk (PDF Toolkit):
pdftk document.pdf dump_data
These tools are invaluable for scripting and batch operations across large document collections.
Using Online Tools
Web-based tools offer convenient metadata viewing without installing software. Our PDF Metadata Viewer lets you upload a PDF and instantly see all metadata fields in an organized interface.
Online tools are ideal for quick checks, but be cautious about uploading sensitive documents to third-party services. For confidential files, use local software instead.
How to Edit PDF Metadata
Editing in Adobe Acrobat Pro
Adobe Acrobat Pro (the paid version) allows full metadata editing:
- Open your PDF in Acrobat Pro
- Go to File > Properties or press
Ctrl+D - In the Description tab, click in any field to edit it
- Modify Title, Author, Subject, and Keywords as needed
- Click Additional Metadata to edit XMP properties
- In the Advanced panel, you can add custom properties
- Click OK to save changes
Acrobat Pro also offers batch metadata editing through Action Wizard, allowing you to apply the same metadata changes to multiple files simultaneously.
Editing in Free PDF Editors
Several free PDF editors support metadata editing:
PDF-XChange Editor (free version):
- File > Document Properties > Description tab
- Edit fields directly and click OK to save
LibreOffice Draw:
- Open PDF in LibreOffice Draw
- File > Properties > Description tab
- Edit metadata and export as PDF
PDFtk Free:
- Windows GUI for PDFtk with metadata editing interface
- Simple form-based editing of standard fields
Note that free tools often have limitations—they may not support XMP metadata editing or custom properties.
Editing with Command-Line Tools
For automation and batch processing, command-line tools are most efficient:
ExifTool can modify most metadata fields:
exiftool -Title="New Title" -Author="John Smith" document.pdf
To process multiple files:
exiftool -Title="Annual Report" -Author="Finance Dept" *.pdf
pdftk uses a two-step process:
# Extract metadata to a text file
pdftk document.pdf dump_data output metadata.txt
# Edit metadata.txt with a text editor
# Update the PDF with modified metadata
pdftk document.pdf update_info metadata.txt output document_updated.pdf
This approach works well for scripted workflows and integration with other systems.
Using Online Metadata Editors
Our PDF Metadata Editor provides a user-friendly interface for editing metadata without installing software:
- Upload your PDF file
- View current metadata in organized fields
- Edit any field you want to change
- Add new custom properties if needed
- Download the updated PDF with modified metadata
The tool preserves all document content and formatting while updating only the metadata layer. It's perfect for quick edits and one-off changes.
Removing Metadata Entirely
Sometimes you want to strip all metadata from a PDF for privacy reasons:
Adobe Acrobat Pro:
- Tools > Redact > Remove Hidden Information
- Select metadata items to remove
- Click Remove to clean the document
ExifTool:
exiftool -all= document.pdf
pdftk:
pdftk document.pdf output clean.pdf
Our Metadata Remover tool strips all metadata while preserving document content, ideal for sharing documents publicly without revealing internal information.
Pro tip: Before removing metadata, save a copy of the original file. Some workflows require metadata for document management, and once removed, it can't be recovered without the original.
Privacy and Security Concerns
What Metadata Can Reveal
PDF metadata can expose information you didn't intend to share. Every time you create or edit a PDF, metadata accumulates, potentially revealing:
- Your identity: Author field often contains your full name or username
- Your organization: Company name in Creator or Producer fields
- Your software: Specific applications and versions you use
- Your location: File paths may include computer names or network locations
- Document history: Creation and modification timestamps reveal workflow patterns
- Editing activity: Number of revisions and time spent editing
- Internal comments: Hidden annotations or review comments
In 2003, a leaked document from the UK government revealed that it had been edited to exaggerate intelligence claims because metadata showed last-minute changes. In 2013, metadata in a PDF released by the NSA revealed the identity of a redacted name. These cases demonstrate how metadata can undermine confidentiality.
Metadata in Sensitive Documents
Certain document types require extra metadata scrutiny:
- Legal documents: May reveal attorney-client privileged information or work product
- Financial reports: Can expose internal systems and processes
- Medical records: May contain patient identifiers beyond the visible content
- Government documents: Could reveal classified information or sources
- Whistleblower submissions: Metadata can identify the source
- Anonymous publications: Author information defeats anonymity
Before sharing sensitive documents, always review and clean metadata. Many organizations have policies requiring metadata removal from externally shared files.
Best Practices for Metadata Privacy
Protect your privacy with these metadata management practices:
- Review before sharing: Always check metadata before sending PDFs externally
- Use generic author names: Set author to your organization name rather than personal name
- Remove metadata from public documents: Strip all metadata from PDFs posted on websites
- Configure PDF creation software: Set default metadata values that don't reveal personal information
- Use metadata removal tools: Automate cleaning for documents leaving your organization
- Educate your team: Ensure everyone understands metadata privacy implications
- Implement document policies: Create organizational standards for metadata handling
Metadata and GDPR Compliance
Under GDPR and similar privacy regulations, metadata containing personal information is subject to the same protections as document content. If metadata includes names, email addresses, or other identifiers, it's considered personal data.
Organizations must:
- Include metadata in data protection impact assessments
- Respond to subject access requests by providing metadata
- Honor right-to-erasure requests by removing metadata
- Implement appropriate security measures for metadata
- Document metadata handling in privacy policies
Failure to manage metadata properly can result in GDPR violations and significant fines.
| Document Type | Privacy Risk | Recommended Action |
|---|---|---|
| Internal memos | Low | Keep metadata for document management |
| Client proposals | Medium | Review and clean sensitive fields |
| Public white papers | High | Remove all metadata except title and author |
| Legal filings | High | Strip all metadata, verify with tools |
| Anonymous submissions | Critical | Complete metadata removal, use clean system |
Metadata Standards and Schemas
Dublin Core
Dublin Core is one of the most widely adopted metadata standards, originally developed for describing web resources but now used across many document types including PDFs. It defines 15 core elements:
- Title, Creator, Subject, Description, Publisher
- Contributor, Date, Type, Format, Identifier
- Source, Language, Relation, Coverage, Rights
Dublin Core's simplicity makes it ideal for basic document description. Libraries, archives, and digital repositories commonly use Dublin Core for cataloging PDFs.
PDF/A Metadata Requirements
PDF/A, the ISO standard for long-term archiving, has specific metadata requirements to ensure documents remain accessible decades into the future:
- XMP metadata must be present and valid
- Document Information Dictionary must match XMP metadata
- Title field must be populated
- Metadata must be embedded in the file, not referenced externally
- Custom metadata schemas must be properly declared
PDF/A-compliant documents ensure metadata survives format migrations and remains readable by future software.
Industry-Specific Schemas
Different industries have developed specialized metadata schemas for their needs:
PRISM (Publishing Requirements for Industry Standard Metadata):
- Used by publishers for journals, magazines, and books
- Includes fields for ISSN, volume, issue, page numbers
- Supports rights management and distribution information
IPTC (International Press Telecommunications Council):
- Standard for news and media organizations
- Includes fields for byline, headline, caption, copyright
- Supports location data and subject categorization
MARC (Machine-Readable Cataloging):
- Library standard for bibliographic data
- Comprehensive cataloging information
- Used by academic and public libraries worldwide
Creating Custom Metadata Schemas
Organizations can define custom metadata schemas for internal needs. XMP's extensibility allows you to create custom namespaces with your own properties:
- Define your namespace URI (e.g., "http://yourcompany.com/metadata/1.0/")
- Create a schema document describing your properties
- Implement the schema in your PDF creation workflow
- Document the schema for future reference
Custom schemas are useful for tracking internal document properties like project codes, approval status, or department classifications.
Pro tip: When implementing custom metadata schemas, maintain backward compatibility by also populating standard fields. This ensures your documents remain usable even if custom schema support isn't available.
Using Metadata to Compare Documents
Identifying Document Versions
Metadata provides crucial clues for identifying which version of a document you're looking at. When you have multiple files with similar names, metadata helps determine which is most recent and authoritative.
Key metadata fields for version identification:
- ModDate: Shows when the document was last modified
- Version numbers: Some documents include version in the Subject or Keywords field