PDF to XML, PDF to JSON, PDF to CSV - Forcing font repair when extracting text from PDF with malformed fonts

PDF to XML, PDF to JSON, PDF to CSV - Forcing font repair when extracting text from PDF with malformed fonts

For Zapier, Integromat and others plugins insert custom profiles into profiles field. For API calls please set value as string in profiles parameter as string.

There are instances when PDF file contains malformed fonts with customized character map making the backward conversion to text impossible. In that situations ensure this by opening the document in Adobe Reader, select text with mouse and copy-paste to any text editor. The same garbled text is copied.

If you need to extract the text from this file at any cost, you can try the special mode that recognizes text using OCR:

{ "OCRMode": "TextFromImagesAndVectorsAndRepairedFonts" };

Applies To:

  • /pdf/convert/to/csv
  • /pdf/convert/to/xml
  • /pdf/convert/to/json
  • /pdf/convert/to/xls
  • /pdf/convert/to/xlsx

Note, this mode works much slower so you should perform requests asynchronously.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.