How can I find all the links in a PDF document?
Estimated Reading Time: 1 MinutesHyperlinks direct a reader to a different part of the same PDF document, or connect the reader to an external web page. In PDF a link is a type of annotation. So to compile a list of all of the links in a PDF document, you would list objects, Type “Annot” and Subtype “Link.” For each link found, you need to identify those links defined Action is to determine whether the link is to a URI/URL, as opposed to the GoTo types that simply jump to a destination elsewhere in the document.
Adobe PDF Library does not offer a method that will list all of the links found in a PDF document in a single call. But we offer some suggestions about making this process work.
First, you need to inspect each page of the document to see if it has any Annotations on it. So after you acquire the page, you would call PDPageGetNumAnnots to find annotations. Then, inspect each annotation found with PDAnnotGetSubtype to determine if an annotation is a link.
The following sample code summarizes this sequence:
for (i=0; i<PDPageGetNumAnnots(pdPage); i++) { annot = PDPageGetAnnot(pdPage, i); if(PDAnnotIsValid(annot) && PDAnnotGetSubtype(annot) == ASAtomFromString("Link")) { ... } }
Then, you need to identify the annotations where the action is defined as a reference to an embedded URL, not a GoTo action to take the reader to another page within the same document.
See Section 12.6.4, “Action Types,” in the ISO 32000 Reference, page 417.
Finally, you should also examine the DLI sample application Annotations, found in your Adobe PDF Library \DLI\Samples folder. This program generates a simple three-page document containing a variety of annotation types. You can also modify its source with your own data to test it further.