PDF.co Document Parser Template Creation Guide

| |{{SmartDate}} |Tries to detect the date in the most common formats.| |{{Number}} |Decimal number like the following: "12.34", "-123,456.78", "123.456". Decimal separator and thousands separator are automatically taken from the template culture.| |{{Money}} |Decimal number with currency symbol like the following: "USD 12.34", "$123,456.78", "123.45 €". Decimal separator and thousands separator are automatically taken from the template culture.| |{{USPhoneNumber}} |Tries to detect US phone number.| |{{Space}} |Single space.| |{{Spaces}} |One or more spaces.| |{{2Spaces}} |Two spaces.| |{{3Spaces}} |Three spaces.| |{{4Spaces}} |Four spaces.| |{{5Spaces}} |Five spaces.| |{{6Spaces}} |Six spaces.| |{{7Spaces}} |Seven spaces.| |{{8Spaces}} |Eight spaces.| |{{9Spaces}} |Nine spaces.| |{{10Spaces}} |Ten spaces.| |{{Digit}} |One digit.| |{{Digits}} |One or more digits.| |{{2Digits}} |Two digits.| |{{3Digits}} |Three digits.| |{{4Digits}} |Four digits.| |{{5Digits}} |Five digits.| |{{6Digits}} |Six digits.| |{{7Digits}} |Seven digits.| |{{8Digits}} |Eight digits.| |{{9Digits}} |Nine digits.| |{{10Digits}} |Ten digits.| |{{DigitOrSymbol}} |One digit or symbol ("-+=/").| |{{DigitsOrSymbols}} |One or more digits or symbols ("-+=/").| |{{2DigitsOrSymbols}} |Two digits or symbols ("-+=/").| |{{3DigitsOrSymbols}} |Three digits or symbols ("-+=/").| |{{4DigitsOrSymbols}} |Four digits or symbols ("-+=/").| |{{5DigitsOrSymbols}} |Five digits or symbols ("-+=/").| |{{6DigitsOrSymbols}} |Six digits or symbols ("-+=/").| |{{7DigitsOrSymbols}} |Seven digits or symbols ("-+=/").| |{{8DigitsOrSymbols}} |Eight digits or symbols ("-+=/").| |{{9DigitsOrSymbols}} |Nine digits or symbols ("-+=/").| |{{10DigitsOrSymbols}} |Ten digits or symbols ("-+=/").| |{{Letter}} |One letter from any language.| |{{Letters}} |One or more letters from any language.| |{{2Letters}} |Two letters from any language.| |{{3Letters}} |Three letters from any language.| |{{4Letters}} |Four letters from any language.| |{{5Letters}} |Five letters from any language.| |{{6Letters}} |Six letters from any language.| |{{7Letters}} |Seven letters from any language.| |{{8Letters}} |Eight letters from any language.| |{{9Letters}} |Nine letters from any language.| |{{10Letters}} |Ten letters from any language.| |¨C48C |One uppercase letter from any language.| |¨C49C |One or more uppercase letters from any language.| |¨C50C |Two uppercase letters from any language.| |¨C51C |Three uppercase letters from any language.| |¨C52C |Four uppercase letters from any language.| |¨C53C |Five uppercase letters from any language.| |¨C54C |Six uppercase letters from any language.| |¨C55C |Seven uppercase letters from any language.| |¨C56C |Eight uppercase letters from any language.| |¨C57C |Nine uppercase letters from any language.| |¨C58C |Ten uppercase letters from any language.| |¨C59C |One letter or digit.| |¨C60C |One or more letters or digits.| |¨C61C |Two letters or digits.| |¨C62C |Three letters or digits.| |¨C63C |Four letters or digits.| |¨C64C |Five letters or digits.| |¨C65C |Six letters or digits.| |¨C66C |Seven letters or digits.| |¨C67C |Eight letters or digits.| |¨C68C |Nine letters or digits.| |¨C69C |Ten letters or digits.| |¨C70C |One uppercase letter or digit.| |¨C71C |One or more uppercase letters or digits.| |¨C72C |Two uppercase letters or digits.| |¨C73C |Three uppercase letters or digits.| |¨C74C |Four uppercase letters or digits.| |¨C75C |Five uppercase letters or digits.| |¨C76C |Six uppercase letters or digits.| |¨C77C |Seven uppercase letters or digits.| |¨C78C |Eight uppercase letters or digits.| |¨C79C |Nine uppercase letters or digits.| |¨C80C |Ten uppercase letters or digits.| |¨C81C |One letter, or digit, or symbol ("-+=/").| |{{LettersOrDigitsOrSymbols}} |One or more letters, or digits, or symbols ("-+=/").| |{{2LettersOrDigitsOrSymbols}} |Two letters, or digits, or symbols ("-+=/").| |{{3LettersOrDigitsOrSymbols}} |Three letters, or digits, or symbols ("-+=/").| |{{4LettersOrDigitsOrSymbols}} |Four letters, or digits, or symbols ("-+=/").| |{{5LettersOrDigitsOrSymbols}} |Five letters, or digits, or symbols ("-+=/").| |{{6LettersOrDigitsOrSymbols}} |Six letters, or digits, or symbols ("-+=/").| |{{7LettersOrDigitsOrSymbols}} |Seven letters, or digits, or symbols ("-+=/").| |{{8LettersOrDigitsOrSymbols}} |Eight letters, or digits, or symbols ("-+=/").| |{{9LettersOrDigitsOrSymbols}} |Nine letters, or digits, or symbols ("-+=/").| |{{10LettersOrDigitsOrSymbols}} |Ten letters, or digits, or symbols ("-+=/").| |{{UppercaseLetterOrDigitOrSymbol}} |One uppercase letter, or digit, or symbol ("-+=/").| |{{UppercaseLettersOrDigitsOrSymbols}} |One or more uppercase letters, or digits, or symbols ("-+=/").| |{{2UppercaseLettersOrDigitsOrSymbols}} |Two uppercase letters, or digits, or symbols ("-+=/").| |{{3UppercaseLettersOrDigitsOrSymbols}} |Three uppercase letters, or digits, or symbols ("-+=/").| |{{4UppercaseLettersOrDigitsOrSymbols}} |Four uppercase letters, or digits, or symbols ("-+=/").| |{{5UppercaseLettersOrDigitsOrSymbols}} |Five uppercase letters, or digits, or symbols ("-+=/").| |{{6UppercaseLettersOrDigitsOrSymbols}} |Six uppercase letters, or digits, or symbols ("-+=/").| |{{7UppercaseLettersOrDigitsOrSymbols}} |Seven uppercase letters, or digits, or symbols ("-+=/").| |{{8UppercaseLettersOrDigitsOrSymbols}} |Eight uppercase letters, or digits, or symbols ("-+=/").| |{{9UppercaseLettersOrDigitsOrSymbols}} |Nine uppercase letters, or digits, or symbols ("-+=/").| |{{10UppercaseLettersOrDigitsOrSymbols}} |Ten uppercase letters, or digits, or symbols ("_-+=/").| |{{Dollar}} |Dollar sign ($).| |{{Euro}} |Euro sign (€).| |{{Pound}} |Pound sign (£).| |{{Yen}} |Yen sign (¥).| |{{Yuan}} |Yuan sign (¥).| |{{CurrencySymbol}} |Any currency symbol ($, €, £, ¥, etc.)| |{{Dot}} |Single dot symbol (".").| |{{Comma}} |Single comma symbol (",").| |{{Colon}} |Single colon symbol (":").| |{{Semicolon}} |Single semicolon symbol (";").| |{{Minus}} |Single minus (dash, hyphen) symbol ("-").| |{{Slash}} |Slash symbol ("/").| |{{Backslash}} |Backslash symbol ("\").| |{{Percent}} |Percent symbol ("%").| |{{LineStart}} |Start of line (virtual symbol).| |{{LineEnd}} |End of line (virtual symbol).| |{{SentenceWithSingleSpaces}} |Single-space-separated sequence of words and symbols. Breaks on double space.| |{{SentenceWithDoubleSpaces}} |Extended {{SentenceWithSingleSpaces}} macro allowing two spaces between words. Breaks on triple space.| |{{EndOfPage}} |End of page or end of document.| |{{WordBoundary}} |Start or end of word (virtual symbol).| |{{OpeningCurlyBrace}} |Opening curly brace symbol ("{").| |{{ClosingCurlyBrace}} |Closing curly brace symbol ("}").| |{{OpeningParenthesis}} |Opening parenthesis symbol ("(").| |{{ClosingParenthesis}} |Closing parenthesis symbol (")").| |{{OpeningSquareBracket}} |Opening square bracket symbol ("[").| |{{ClosingSquareBracket}} |Closing square bracket symbol ("]").| |{{OpeningAngleBracket}} |Opening angle bracket symbol ("<").| |{{ClosingAngleBracket}} |Closing angle bracket symbol (">").| |{{DateMM/DD/YY}} |Date in format "01/01/19" (with leading zero).| |{{DateM/D/YY}} |Date in format "1/1/19" (without leading zero).| |{{DateMM/DD/YYYY}} |Date in format "01/01/2019" (with leading zero).| |{{DateM/D/YYYY}} |Date in format "1/1/2019" (without leading zero).| |{{DateMM-DD-YY}} |Date in format "01-01-19" (with leading zero).| |{{DateM-D-YY}} |Date in format "1-1-19" (without leading zero).| |{{DateMM-DD-YYYY}} |Date in format "01-01-2019" (with leading zero).| |{{DateM-D-YYYY}} |Date in format "1-1-2019" (without leading zero).| |{{DateMM.DD.YY}} |Date in format "01.01.19" (with leading zero).| |{{DateM.D.YY}} |Date in format "1.1.19" (without leading zero).| |{{DateMM.DD.YYYY}} |Date in format "01.01.2019" (with leading zero).| |{{DateM.D.YYYY}} |Date in format "01.01.2019" (without leading zero).| |{{DateDD/MM/YY}} |Date in format "01/01/19" (with leading zero).| |{{DateD/M/YY}} |Date in format "1/1/19" (without leading zero).| |{{DateDD/MM/YYYY}} |Date in format "01/01/2019" (with leading zero).| |{{DateD/M/YYYY}} |Date in format "1/1/2019" (without leading zero).| |{{DateDD-MM-YY}} |Date in format "01-01-19" (with leading zero).| |{{DateD-M-YY}} |Date in format "1-1-19" (without leading zero).| |{{DateDD-MM-YYYY}} |Date in format "01-01-2019" (with leading zero).| |{{DateD-M-YYYY}} |Date in format "1-1-2019" (without leading zero).| |{{DateDD.MM.YY}} |Date in format "01.01.19" (with leading zero).| |{{DateD.M.YY}} |Date in format "1.1.19" (without leading zero).| |{{DateDD.MM.YYYY}} |Date in format "01.01.2019" (with leading zero).| |{{DateD.M.YYYY}} |Date in format "1.1.2019" (without leading zero).| |{{DateYYYYMMDD}} |Date in format "20190101".| |{{DateYYYY/MM/DD}} |Date in format "2019/01/01" (with leading zero).| |{{DateYYYY/M/D}} |Date in format "2019/1/1" (without leading zero).| |{{DateYYYY-MM-DD}} |Date in format "2019-01-01" (with leading zero).| |{{DateYYYY-M-D}} |Date in format "2019-1-1" (without leading zero).| |{{Anything}} |Any characters up to the next macro in the expression.| |{{AnythingGreedy}} |Any characters up to the next macro in the expression or to the end of line. Greedy version.| |{{ToggleSingleLineMode}} |Enables or disables single-line mode. In single-line mode, {{Anything}} and {{AnythingGreedy}} macros do not stop at the end of the line and proceed to the next line of text.| |{{ToggleCaseInsensitiveMode}} |Enables or disables case-insensitive mode.|

Special Functions

You can also insert so called special function which looks like this: $$functionName. Special fucntions are created for AI-powered value detection, like a company name, max number in a whole document, max date or even finding and decoding QR Code barcode value inside document.

All special functions are listed here

APPENDIX 2: Sample templates

Sample 1

Sample document text:

    DigitalOcean
    101 Avenue of the Americas, 10th Floor
    New York, NY 10013
                                                        Date Issued: February 1, 2016
                                                         Period: January 1 - 31, 2016
                                                              Invoice Number: 1234567

        Description                                 Hours     Start          End            USD
        Website-Dev (1GB)                           744       01-01 00:00    01-31 23:59    $10.00
        Website-Live (1GB)                          744       01-01 00:00    01-31 23:59    $10.00
        Database-Live (2GB)                         744       01-01 00:00    01-31 23:59    $20.00
        Tasks-Dev (1GB)                             744       01-01 00:00    01-31 23:59    $10.00
                                                                                     Total: $50.00

     Bill To:
     Samee Sikka <admin@meee.org>
     meee.org
     Gouran

        If you have a credit card on file it will be automatically charged within 24 hours.

Sample template (YAML):

{
  "templateVersion": 4,
  "templatePriority": 0,
  "templateName": "DigitalOcean Invoice",
  "objects": [
    {
      "name": "companyName",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "static",
        "expression": "DigitalOcean"
      }
    },
    {
      "name": "invoiceId",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "macros",
        "expression": "Invoice Number: ({{Digits}})",
        "regex": true
      }
    },
    {
      "name": "dateIssued",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "macros",
        "expression": "Date Issued: ({{SmartDate}})",
        "dataType": "date",
        "dateFormat": "auto-mdy"
      }
    },
    {
      "name": "total",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "macros",
        "expression": "Total: {{Dollar}}({{Number}})",
        "dataType": "decimal"
      }
    },
    {
      "name": "currency",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "static",
        "expression": "USD"
      }
    },
    {
      "name": "table1",
      "objectType": "table",
      "tableProperties": {
        "start": {
          "expression": "Description{{Spaces}}Hours"
        },
        "end": {
          "expression": "Total:"
        },
        "row": {
          "expression": "{{LineStart}}{{Spaces}}(?<description>{{SentenceWithSingleSpaces}}){{Spaces}}(?<hours>{{Digits}}){{Spaces}}(?<start>{{2Digits}}{{Minus}}{{2Digits}}{{Space}}{{2Digits}}{{Colon}}{{2Digits}}){{Spaces}}(?<end>{{2Digits}}{{Minus}}{{2Digits}}{{Space}}{{2Digits}}{{Colon}}{{2Digits}}){{Spaces}}{{Dollar}}(?<unitPrice>{{Number}})",
          "regex": true
        },
        "columns": [
          {
            "name": "hours",
            "type": "integer"
          },
          {
            "name": "unitPrice",
            "type": "decimal"
          }
        ]
      }
    }
  ]
}

Result (JSON):

{
  "templateName": "DigitalOcean Invoice",
  "templateVersion": "4",
  "objects": [
    {
      "name": "companyName",
      "objectType": "field",
      "value": "DigitalOcean"
    },
    {
      "name": "invoiceId",
      "objectType": "field",
      "value": "1234567",
      "pageIndex": 0,
    },
    {
      "name": "dateIssued",
      "objectType": "field",
      "value": "2016-02-01T00:00:00",
      "pageIndex": 0,
    },
    {
      "name": "total",
      "objectType": "field",
      "value": 50.00,
      "pageIndex": 0,
    },
    {
      "name": "currency",
      "objectType": "field",
      "value": "USD"
    },
    {
      "name": "table1",
      "objectType": "table",
      "rows": [
        {
          "description": {
            "value": "Website-Dev (1GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 10.00,
            "pageIndex": 0
          }
        },
        {
          "description": {
            "value": "Website-Live (1GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 10.00,
            "pageIndex": 0
          }
        },
        {
          "description": {
            "value": "Database-Live (2GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 20.00,
            "pageIndex": 0
          }
        },
        {
          "description": {
            "value": "Tasks-Dev (1GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 10.00,
            "pageIndex": 0
          }
        }
      ]
    }
  ]
}

Copyright (c) 2018-2022 ByteScout, Inc.

PDF.co (on-demand platform) with Document Parser

ByteScout (on-prem tools)

{% endraw %}

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.