Loading...
Skip to end of metadata
Go to start of metadata

Text in PDF files is much more complicated than it looks. What you see when you view a file are pictures that look like letters and that we recognize together as words. These pictures are not just pictures, though; there is additional information inside the PDF file that states what letters these pictures represent. This is referred to as the “encoding” of text inside a document.

PDF Alchemist relies on the encoding information – the information in a PDF file that ties together the visual “pictures” of letters and words, and the text that they represent – in its conversion of text into HTML format. Some PDF files deliberately scramble this information, in order to prevent users from copying or using text from those PDF files. Other PDF files contain inaccurate information, or none at all, for some or all the letters inside. In these cases, PDF Alchemist does not have enough information to know what the text is and cannot translate it correctly.

  • No labels