Understanding PDF Merging Issues with Structure Trees
Estimated Reading Time: 2 MinutesThe Details
When merging tagged PDF documents such as PDF/UA documents that contain large structure trees, users have reported issues where the merging process generates excessively large temporary files. This can lead to excessive disk space consumption, potentially causing the operation to hang indefinitely or fail altogether.
These structure trees are vital for ensuring that the documents are compliant with PDF/UA standards, which are essential for accessibility. So while a possible workaround is removing structure trees before merging, this is not an ideal solution. An alternative is to use the PDInsertDoNotResolveInvalidStructureParentReferences
flag with the PDDocInsertPages
call. What this flag does is turn-off an additional routine to check-and-attempt-to-fix certain invalid StructureElement node dictionaries. If the source document is a well-formed PDF/UA document being inserted into a well-formed PDF/UA document, then the result should be a well-formed PDF/UA document no matter whether the flag is turned on or off. However note that if inserting untagged PDF into a PDF/UA compliant document, or the reverse, you almost automatically end up with a non-PDF/UA compliant document because compliance requires all content to be tagged appropriately.
Resolution Summary
The issue was addressed through a better understanding of the PDInsertDoNotResolveInvalidStructureParentReferences
flag. This flag controls whether the library attempts to resolve structural parent elements when inserting pages from one PDF into another. The key takeaway from the support response is that enabling this flag can improve performance while merging large documents, as it skips the additional checks for structural integrity that may not be necessary in many cases.
All in all, we recommend enabling the PDInsertDoNotResolveInvalidStructureParentReferences
flag when using the PDDocInsertPages
function to improve performance during the merging process.
How to Get Additional Help
If you require further assistance or have additional questions, do not hesitate to reach out for support. You can contact the technical support team via email at tech_support@datalogics.com. For more resources and documentation, please visit our website at datalogics.com and our documentation site at docs.datalogics.com.
It is essential to stay informed about the latest updates and practices when working with PDF documents, especially for those handling accessibility features. Regularly checking the documentation and support forums can provide additional insights and help mitigate similar issues in the future.