Skip to Content

Can the PDF Library extract individual images from a PDF document?

Estimated Reading Time: 1 Minutes

An image XObject is a COS stream object. The stream dictionary holds entries that give the number of rows and the width of a row in pixels. Other entries describe the color model in use and the line and row progression. Image XObjects are described in Section 8.9.5, “Image Dictionaries,” in the ISO 32000 Reference. Note that a PDF page can be comprised of multiple individual images as well as path and text content.

The images can be extracted from a document by recursing through a page content stream and acquiring the PDEImage elements. These can be exported out to supported Image file formats with DLExportPDEImage(). See the ImageExport sample. 

Alternatively, if you are working at the COS layer, you  can use the information from the stream dictionary to form the header of an image in a standard form. The content of the bitmap can then be read usingCOSStreamOpenStm() and read with ASStmRead(). Formatting this information from the COS layer into a standard image format is the responsibility of your application.

 

.NET and Java

PDEImages are represented by the Image class. These can be exported out to image file formats using the Save() method. 

Can the PDF Library extract individual images from a PDF document?
  • COMMENT