How can I find all the links in a PDF document?
Estimated Reading Time: 1 MinutesHyperlinks direct a user to a different part of the same PDF document, or connect the user to an external web page. In PDF, a link is a type of annotation. So to compile a list of all of the links in a PDF document, you would search for annots of Type “Annot” and Subtype “Link.” For each link found, you need to identify those links that has a defined Action is to determine whether the link is to a URI/URL, as opposed to the GoTo types that simply jump to a destination elsewhere in the document.
Adobe PDF Library does not offer a method that will list all of the links found in a PDF document in a single call. To obtain the links, you need to inspect each page of the document to see if it has any Annotations on it:
- Acquire the page
- Call PDPageGetNumAnnots to find annotations
- Inspect each annotation with PDAnnotGetSubtype to determine if an annotation is a link
The following sample code summarizes this sequence:
for (i=0; i<PDPageGetNumAnnots(pdPage); i++) { annot = PDPageGetAnnot(pdPage, i); if(PDAnnotIsValid(annot) && PDAnnotGetSubtype(annot) == ASAtomFromString("Link")) { ... } }
Then, you need to identify the annotations where the action is defined as a reference to an embedded URL, not a GoTo action to take the reader to another page within the same document.
See Section 12.6.4, “Action Types,” in the ISO 32000 Reference, page 417.