PDF to Text with scanned documents configuring OCR text corrections

PDF to Text with scanned documents configuring OCR text corrections

For Zapier, Integromat and others plugins insert custom profiles into profiles field. For API calls please set value as string in profiles parameter as string.

We can configure manually corrections for known errors when extracting data from PDF like below.

{
    "profiles": [
        {
            "options": {
                "TrimSpaces": "False",
                "PreserveFormattingOnTextExtraction": "True",
                "Unwrap": "True"
            }
        },
        {
            "correction1": {
                "OCRCorrections.Add()": [ "Test", "XXXX", false]
            }
        },
        {
            "correction2": {
                "OCRCorrections.Add()": [ "OCR", "YYY", false]
            }
        }
    ]
}

We can also specify regex based corrections like following. It’s for cases when date like 11/03/2018 is resulted in 11V03V2018.

{ "OCRCorrections.Add()": [ "(\\d)V(\\d)", "$1\\/$2", true] }

Applies To:

  • /pdf/convert/to/csv
  • /pdf/convert/to/xml
  • /pdf/convert/to/json
  • /pdf/convert/to/xls
  • /pdf/convert/to/xlsx
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.