Using APDFL to create tools for working with PDF/UA files
Estimated Reading Time: 2 MinutesNote: This article is a work in progress.
What is PDF/UA?
PDF/UA (Universal Accessibility) is a set of specifications for making PDFs accessible to persons with disabilities by imposing additional requirements over the baseline PDF specifications: Specifically, ISO 14289-1:2014 for PDF v1.7 (ISO 32000-1) and ISO 14289-2:2024 for PDF v2.0 (ISO 32000-2). These ISO 14289 specifications have been sponsored via the PDF Association and are available for free download from https://pdfa.org/accessibility/ along with other documentation resources that should be helpful for building tools for creating and manipulating Accessible PDFs.
Both version build upon the (14.8) Tagged PDF and (14.9) Accessibility Support clauses of their respective versions of the ISO 32000 PDF specifications, which in turn builds upon the preceding (14.7) Logical Structure. However, in PDF 2.0, the Tagged PDF and Accessibility Support clauses were substantially rewritten compared to the preceding ISO 32000-1 specification, so an accessible v1.7 PDF can be rather different from an accessible v2.0 PDF. The newer PDF/UA-2 specification is by all measures better for the years of effort put into improving both PDF itself and PDF/UA, but its major weakness is that implementation of this standard lags behind in terms of support. This is slowly being rectified, but for now, both standards might need to be supported.
Now APDFL's C interface offers robust support for PDF's Logical Structure in its PDF Structure Edit Layer API calls, which can be used for creating or manipulating PDF Tagging even though there aren't additional API specifically related to enforcing the rules of PDF Tagging (which changed some between ISO 32000-1 and 32000-2).
A key requirement for accessibility is that all real content must be tagged and everything else marked as an artifact which can be ignored. In practical terms with APDFL, this means that most PDEElements in PDEContent needs to be enclosed within a PDEContainer and the PDEContainer either given an MCTag of "Artifact" or associated with a Block or Inline Element via the PDSElementInsertMCAsKid call. There are more details, to be sure, but that's the crux of how PDEContent becomes tagged PDF.
.