PDF to XML, PDF to JSON, PDF to CSV - Forcing font repair when extracting text from PDF with malformed fonts
For Zapier, Integromat and others plugins insert custom profiles
into profiles
field. For API calls please set value as string in profiles
parameter as string.
There are instances when PDF file contains malformed fonts with customized character map making the backward conversion to text impossible. In that situations ensure this by opening the document in Adobe Reader, select text with mouse and copy-paste to any text editor. The same garbled text is copied.
If you need to extract the text from this file at any cost, you can try the special mode that recognizes text using OCR:
{ "OCRMode": "TextFromImagesAndVectorsAndRepairedFonts" };
Applies To:
/pdf/convert/to/csv
/pdf/convert/to/xml
/pdf/convert/to/json
/pdf/convert/to/xls
/pdf/convert/to/xlsx
Note, this mode works much slower so you should perform requests asynchronously.