How to extract key-value objects
To extract key-value pairs from a document you need to do the following:
- Create a new field using
Add FIELD with auto KEY-VALUE extraction
- Make sure to enable
Regex
for this new field
put the following expression into a expression
property:
(?<key>{{SentenceWithSingleSpaces}}): (?<value>{{SentenceWithSingleSpaces}})
What it means:
?<key>
tells that the macro next to it is the name of the field (i.e.key
). Important: if you place?<key>
and this field if found multiple times then it will generate multiple objects as output, for every matching object accordingly.?<value>
tells that the macro next to it is a value.{{SentenceWithSingleSpaces}}
is the macro that captures a sentence with single spaces inside. See all macros here
Sample:
Input text:
Name: Alfred Pennyworth
ID: 000012345
DOB: 8/16/1943 (78 years)
Output JSON objects will generate:
{
"objects": [
{
"name": "Name",
"objectType": "field",
"value": "Alfred Pennyworth",
"pageIndex": 0
},
{
"name": "ID",
"objectType": "field",
"value": "000012345",
"pageIndex": 0
},
{
"name": "DOB",
"objectType": "field",
"value": "8/16/1943 (78 years)",
"pageIndex": 0
}
...
CSV output:
Name,ID,DOB
Alfred Pennyworth,000012345,8/16/1943 (78 years),