Skip to main content

Field Map Document

The field map is a YAML document containing a list of user defined fields from the PDF that Sheetloom uses to generate fields for the CSV file. It is typically created in advance of the PDF to CSV conversion process and uploaded into Sheetloom.

The map can be created externally, or within Sheetloom by defining fields within the text document. The following describes both options.

It is configured to work with a heading, regular expression, and boolean expression.

Entry TypeDescription
HeadingUsed when the field value is in a precise position in all PDF documents, otherwise an Expression is used.
ExpressionUsed when field value text is within a larger body of selected text, or when the target text is not or may not be in a precise position in PDF documents.
Boolean ExpressionUsed to indicate if a field value exists or not (true or false by default).

Knowledge of YAML and regular expressions are prerequisites.

1. Create a Field Map Externally

1.1 Identify Fields

Identify fields from the PDF that are required in the final CSV table, and the appropriate field entry type for each of them.

1.2 Compile a Field Map

Each of the three possible field entry types requires a different set of fields in the YAML document.

1.2.1 Header Field Entry Type

pdf-header-yaml


FieldDescription
NameField name for CSV output.
TypeEntry type =Header.
IndentCharacters to count from left-hand side of the row the value is on.
LengthMaximum number of characters that will be extracted from the value.
startIndexNumber of characters from beginning of the PDF to the value starting position.

1.2.2 Expression Field Entry Type

pdf-header-yaml


FieldDescription
NameField name for CSV output.
TypeEntry type = Expression.
OrdinalThe instance of the expression to return if more than one is found. Defaults to first if blank.
ExpressionThe regular expression.

1.2.3 Boolean Expression Field Entry Type

pdf-header-yaml


FieldDescription
NameField name for CSV output.
TypeEntry type = Boolean Expression.
OrdinalThe instance of the expression to return if more than one is found. Defaults to first if blank.
ExpressionThe regular expression.
true_valueThe value to return if the expression is present in the text.
false_valueThe value to return if the expression is not present in the text.

Once completed the field map can be uploaded to Sheetloom.


2 Create a Field Map in Sheetloom

This can be done in the text preview when the text files have been extracted from uploaded PDFs. A user creates a field map by manually defining the fields. An existing field map template is used as the base document, and must contain at least one field.

map-doc-template


2.1 Upload a Field Map Template

From the PDFs page click the upload icon in the Field Map column and upload the template, which is in YAML format. The upload icon becomes inactive.

Click on the txt icon in the TXT column to open a list of text files then select the eye icon for any of the files to open text preview. Fields can now be added directly.

2.2 Add a Field to the Field Map.

A field is identified using one of three different entry types depending on the way the text is laid out: Heading, Expression (regex), or Boolean Expressions.

2.2.1 Add a Heading Type

Used when the field name is on one line and its value is on the next line, with no other text preceding the value.

Select the range to include the header name, with the field value on the following line. If other text precedes the field value the Expression type must be used.

Click Add Entry to launch the field configuration panel. Type=Heading is shown as default. The full text from the first line is displayed in the Heading box. This entry will be the value in the field map and generated CSV, and can be edited to a short and meaningful name.

The Expression box can be disregarded.


map-doc-fieldselect

The graphic shows a Heading name that has been changed to Durée from Durée Date de fin, only the former being needed in the field header. The field value is on the following line with no other text preceding it, therefore the Type = Heading can be used.

Click Add Entry again to apply and save the changes.

Continue to add additional fields using one of the three types as described. When all fields have been added, click Download Field Map. This field map can now replace the field map template. To do this click on the Replace Field Map icon in the Field Map column and upload the new file.


2.2.2 Add an Expression Type

A regex expression is used to identify and isolate specific text within a larger body of selected text, or when the field header and field value are on the same line. The text will be the field value in the generated CSV file.

Select the range to include the field value.

Click Add Entry to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Expression

In the Expression box, edit or provide the regular expression (regex) to extract the required field value.

The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the mapping document and generated CSV, so it should be edited to give a short and meaningful name.

map-doc-fieldselect

The expression in the Expression box will capture any text located where "Avenue J.F. Kennedy" is positioned, disregarding the rest in yellow. The full expression is: (?:Numero\s+Rue\s+\d+\s+)(.*)

The field name in the Heading has been edited to reflect the fact that only Rue, the "Street name", is needed.

Click Add Entry to apply and save the changes.

Continue to add additional fields using one of the three types described. When all fields have been added, click Download Field Map. This field map can now replace the field map template. To do this click on the Replace Field Map icon in the Field Map column and upload the new file.

2.2.3 Add a Boolean Expression Type

The value returned by the Boolean Expression indicates if the text was found. A "true" value indicates that it was; a false that it was not. true or false will be the field values in the generated CSV file. These are system default values and can be edited in the mapping document that is created.

Select the range to include the field value.

Click Add Entry to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Boolean Expression

In the Expression box edit or provide the regular expression (regex) that will extract the required field value. As Boolean Expression has been selected, the output will be true or false depending on whether the value is found.

The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the field map document and generated CSV and can be edited to a short and meaningful name.

Continue to add additional fields using one of the types described. When all fields have been added, click Download Field Map. This field map can now replace the template. To do this click on the Replace Field Map icon in the Field Map column and upload the new file.