Field Map Document
The field map is a YAML document containing a list of user defined fields from the PDF that Sheetloom uses to generate fields for the CSV file. It is typically created in advance of the PDF to CSV conversion process and uploaded into Sheetloom.
The map can be created externally, or within Sheetloom by defining fields within the text document. The following describes both options.
It is configured to work with a heading, regular expression, and boolean expression.
Entry Type | Description |
---|---|
Heading | Used when the field value is in a precise position in all PDF documents, otherwise an Expression is used. |
Expression | Used when field value text is within a larger body of selected text, or when the target text is not or may not be in a precise position in PDF documents. |
Boolean Expression | Used to indicate if a field value exists or not (true or false by default). |
Knowledge of YAML and regular expressions are prerequisites.
1. Create a Field Map Externally
1.1 Identify Fields
Identify fields from the PDF that are required in the final CSV table, and the appropriate field entry type for each of them.
1.2 Compile a Field Map
Each of the three possible field entry types requires a different set of fields in the YAML document.
1.2.1 Header Field Entry Type
Field | Description |
---|---|
Name | Field name for CSV output. |
Type | Entry type =Header. |
Indent | Characters to count from left-hand side of the row the value is on. |
Length | Maximum number of characters that will be extracted from the value. |
startIndex | Number of characters from beginning of the PDF to the value starting position. |
1.2.2 Expression Field Entry Type
Field | Description |
---|---|
Name | Field name for CSV output. |
Type | Entry type = Expression. |
Ordinal | The instance of the expression to return if more than one is found. Defaults to first if blank. |
Expression | The regular expression. |
1.2.3 Boolean Expression Field Entry Type
Field | Description |
---|---|
Name | Field name for CSV output. |
Type | Entry type = Boolean Expression. |
Ordinal | The instance of the expression to return if more than one is found. Defaults to first if blank. |
Expression | The regular expression. |
true_value | The value to return if the expression is present in the text. |
false_value | The value to return if the expression is not present in the text. |
Once completed the field map can be uploaded to Sheetloom.
2 Create a Field Map in Sheetloom
This can be done in the text preview when the text files have been extracted from uploaded PDFs. A user creates a field map by manually defining the fields. An existing field map template is used as the base document, and must contain at least one field.
2.1 Upload a Field Map Template
From the PDFs page click the upload
icon in the Field Map column and upload the template, which is in YAML format. The upload
icon becomes inactive.
Click on the txt
icon in the TXT column to open a list of text files then select the eye
icon for any of the files to open text preview. Fields can now be added directly.
2.2 Add a Field to the Field Map.
A field is identified using one of three different entry types depending on the way the text is laid out: Heading, Expression (regex), or Boolean Expressions.
2.2.1 Add a Heading Type
Used when the field name is on one line and its value is on the next line, with no other text preceding the value.
Select the range to include the header name, with the field value on the following line. If other text precedes the field value the Expression type must be used.
Click Add Entry
to launch the field configuration panel. Type=Heading is shown as default. The full text from the first line is displayed in the Heading box. This entry will be the value in the field map and generated CSV, and can be edited to a short and meaningful name.
The Expression box can be disregarded.
The graphic shows a Heading name that has been changed to Durée from Durée Date de fin, only the former being needed in the field header. The field value is on the following line with no other text preceding it, therefore the Type = Heading can be used.
Click Add Entry
again to apply and save the changes.
Continue to add additional fields using one of the three types as described. When all fields have been added, click Download Field Map
. This field map can now replace the field map template. To do this click on the Replace Field Map
icon in the Field Map column and upload the new file.
2.2.2 Add an Expression Type
A regex expression is used to identify and isolate specific text within a larger body of selected text, or when the field header and field value are on the same line. The text will be the field value in the generated CSV file.
Select the range to include the field value.
Click Add Entry
to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Expression
In the Expression box, edit or provide the regular expression (regex) to extract the required field value.
The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the mapping document and generated CSV, so it should be edited to give a short and meaningful name.
The expression in the Expression box will capture any text located where "Avenue J.F. Kennedy" is positioned, disregarding the rest in yellow. The full expression is: (?:Numero\s+Rue\s+\d+\s+)(.*)
The field name in the Heading has been edited to reflect the fact that only Rue, the "Street name", is needed.
Click Add Entry
to apply and save the changes.
Continue to add additional fields using one of the three types described. When all fields have been added, click Download Field Map
. This field map can now replace the field map template. To do this click on the Replace Field Map
icon in the Field Map column and upload the new file.
2.2.3 Add a Boolean Expression Type
The value returned by the Boolean Expression indicates if the text was found. A "true" value indicates that it was; a false that it was not. true or false will be the field values in the generated CSV file. These are system default values and can be edited in the mapping document that is created.
Select the range to include the field value.
Click Add Entry
to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Boolean Expression
In the Expression box edit or provide the regular expression (regex) that will extract the required field value. As Boolean Expression has been selected, the output will be true or false depending on whether the value is found.
The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the field map document and generated CSV and can be edited to a short and meaningful name.
Continue to add additional fields using one of the types described. When all fields have been added, click Download Field Map
. This field map can now replace the template. To do this click on the Replace Field Map
icon in the Field Map column and upload the new file.