Skip to end of metadata
Go to start of metadata

You may see that, after you process a PDF document with PDF Alchemist, you are missing some content that was near the top or the bottom of a page.

PDF Alchemist uses heuristics to detect and remove content that is part of the header or the footer of a PDF page. The system assumes that this text, such as the page number and date, is content that is an artifact of pagination when generating a PDF document, and thus by default can be discarded. Text like "Page 7" is from the footer, and not part of the text that you want to convert to HTML.

Sometimes, however, PDF Alchemist confuses content at the top or bottom of a PDF page with header or footer content, and so it removes that content too, content you might rather preserve.

If this happens, or if maintaining page content is more important to your workflow than removing page numbers, headers, footers and other pagination artifacts, you may want to disable the header/footer artifact logic with one of the following options:

  • No labels