Skip to content

PdfReader.Open throws "Page has no MediaBox" for pages that inherit /MediaBox from an ancestor /Pages node #375

@professor-k

Description

@professor-k

Summary

PdfReader.Open fails on any PDF whose page(s) do not carry a direct /MediaBox but inherit it from a parent /Pages node. /MediaBox is an inheritable page attribute (PDF 32000-1:2008, §7.7.3.4, Table 31), so this is a valid, common construction — e.g. every PDF produced by ComponentOne (GrapeCity) C1.C1Preview PdfExportProvider is laid out this way.

The page object is constructed (which reads/validates /MediaBox) before the inherited attributes are applied, so the lookup fails.

Version

  • Broken: PDFsharp 7.0.0-preview-1 (NuGet), net8.0. The same code is present on main (HEAD).
  • Works: PDFsharp 6.2.4 and 1.50.5147 — both open the same file as a 612×792 page.

So this is a regression introduced by the 7.0 page-tree rewrite; the inherited-attribute handling that the 6.x reader did correctly was

Repro

Attached: c1-inherited-mediabox.pdf — a trivial one-page document produced by ComponentOne PdfExportProvider. Its page tree is:

10 0 obj <</Type /Pages /Count 1 /MediaBox [0 0 612 792] /Resources 11 0 R /Kids [12 0 R]>>
12 0 obj <</Type /Page /Parent 10 0 R /Contents 2 0 R /Resources 11 0 R>>

/MediaBox is on the /Pages node (obj 10) and inherited by the page (obj 12), which has none of its own.

using var fs = File.OpenRead("c1-inherited-mediabox.pdf");
var doc = PdfReader.Open(fs, PdfDocumentOpenMode.Import); // throws

Expected

The page opens with size 612×792, resolving /MediaBox via inheritance (as 1.50 and other readers do).

Actual

System.InvalidOperationException: Page has no MediaBox.
   at PdfSharp.Pdf.PdfPage.Initialize(Boolean setupSizeFromMediaBox)
   at PdfSharp.Pdf.PdfPage..ctor(PdfDictionary dict)
   at PdfSharp.Pdf.ElementsBase.CreateContainer(Type type, PdfContainer oldContainer, Boolean createIndirect)
   at PdfSharp.Pdf.PdfPages.TraversePageTree(List`1 pages, PdfPageTreeNode treeNode, InheritedValues inheritedValues)
   at PdfSharp.Pdf.PdfPages.FlattenPageTree()
   at PdfSharp.Pdf.IO.PdfReader.OpenFromStream(...)
   at PdfSharp.Pdf.IO.PdfReader.Open(...)

Root cause

In PdfPages.TraversePageTree (src/foundation/src/PDFsharp/src/PdfSharp/Pdf/PdfPages.cs):

var page = (PdfPage)kid.Elements.CreateContainer(typeof(PdfPage), kid, true); // (1) constructs PdfPage -> throws
page.ApplyInheritedValues(ref inheritedValues);                              // (2) inherited /MediaBox applied here, too late

CreateContainer (1) runs the PdfPage(PdfDictionary) constructor, which calls Initialize(setupSizeFromMediaBox: true). In PdfPage.Initialize (.../Pdf/PdfPage.cs):

var rectangle = Elements.GetRectangle(InheritablePageKeys.MediaBox, false);
if (rectangle.IsEmpty)
    throw new InvalidOperationException("Page has no MediaBox.");

GetRectangle(..., false) reads only the page's own dictionary (no inheritance), so for an inheriting page it is empty and Initialize throws — before ApplyInheritedValues (2) ever runs.

Suggested fix

Apply inherited values before the page's size is set up — e.g. call ApplyInheritedValues prior to (or as part of) Initialize, or have Initialize/GetRectangle consult inherited attributes (walk /Parent) when the page has no direct /MediaBox, or defer the MediaBox validation until after inheritance has been promoted in TraversePageTree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions