PDF to Text with scanned documents configuring OCR text corrections
For Zapier, Integromat and others plugins insert custom profiles
into profiles
field. For API calls please set value as string in profiles
parameter as string.
We can configure manually corrections for known errors when extracting data from PDF like below.
{
"profiles": [
{
"options": {
"TrimSpaces": "False",
"PreserveFormattingOnTextExtraction": "True",
"Unwrap": "True"
}
},
{
"correction1": {
"OCRCorrections.Add()": [ "Test", "XXXX", false]
}
},
{
"correction2": {
"OCRCorrections.Add()": [ "OCR", "YYY", false]
}
}
]
}
We can also specify regex based corrections like following. It’s for cases when date like 11/03/2018
is resulted in 11V03V2018
.
{ "OCRCorrections.Add()": [ "(\\d)V(\\d)", "$1\\/$2", true] }
Applies To:
/pdf/convert/to/csv
/pdf/convert/to/xml
/pdf/convert/to/json
/pdf/convert/to/xls
/pdf/convert/to/xlsx