Skip to main content

Field Map Document

The field map is a document in YAML format containing a list of user defined fields appearing in the source PDF. Sheetloom uses the map to extract fields for a CSV file it builds.

It can be created externally in advance and uploaded to Sheetloom, or created from the text document generated during the conversion process. The following describes both methods.

Each field is configured to work with either a heading, regular expression, or boolean expression.

Knowledge of YAML and regular expressions are prerequisites.

Entry TypeDescription
HeadingUsed when the field value is in a precise position in all PDF documents, otherwise an Expression is used.
ExpressionUsed when field value text is within a larger body of selected text, or when the target text is not or may not be in a precise position in PDF documents.
Boolean ExpressionUsed to indicate if a field value exists or not (true or false by default).

1.Create an External Field Map

1.1 Identify Fields

Identify fields from the PDF that are required in the final CSV table, and the appropriate field entry type for each of them.

1.2 Compile a Field Map

Each of the three possible field entry types requires a different set of fields in the YAML document.

1.2.1 Header Field Entry Type

pdf-header-yaml


FieldDescription
NameField name for CSV table output.
TypeEntry type =Header.
IndentCharacters to count from left-hand side of the row the value is on.
LengthMaximum number of characters that will be extracted from the value.
startIndexNumber of characters from beginning of the PDF to the value starting position.

1.2.2 Expression Field Entry Type

pdf-header-yaml


FieldDescription
NameField name for CSV table output.
TypeEntry type = Expression.
OrdinalThe instance of the expression to return if more than one is found. Defaults to first if blank.
ExpressionThe regular expression.

1.2.3 Boolean Expression Field Entry Type

pdf-header-yaml


FieldDescription
NameField name for CSV table output.
TypeEntry type = Boolean Expression.
OrdinalThe instance of the expression to return if more than one is found. Defaults to first if blank.
ExpressionThe regular expression.
true_valueThe value to return if the expression is present in the text.
false_valueThe value to return if the expression is not present in the text.

Once completed the field map can be uploaded to Sheetloom.


2 Create a Field Map in Sheetloom

This is done in the text preview after the text files have been extracted from uploaded PDFs. A user creates a field map by manually defining the fields.

2.1 Open Text File

In the PDFs page click on the txt icon in the TXT column to view file or list of files if more than one PDF has been uploaded. Select the eye icon on one of the files to open text preview. Fields can now be added directly.

view text file

2.2 Add a Field to the Field Map.

A field is identified using one of three different entry types depending on the way the text is laid out: Heading, Expression (regex), or Boolean Expressions.

2.2.1 Add a Heading Type

Used when the field name is on one line and its value is on the next line, with no other text preceding the value.

Select the range to include the header name, with the field value on the following line. If other text precedes the field value the Expression type must be used.

Click Add Entry to launch the field configuration panel. Type=Heading is shown as default. The full text from the first line is displayed in the Heading box. This entry will be the field name in the field map and generated CSV, and can be edited to a short and meaningful name.

The Expression box can be disregarded.


map-doc-fieldselect

The graphic shows a Heading name that has been changed to Durée from Durée Date de fin. The field value is on the following line with no other text preceding it, therefore the Type = Heading can be used.

Click Add Entry again to apply and save the changes.

Continue to add additional fields using one of the three entry types as described. When all fields have been added, click Download Field Map. The map can now be uploaded to Sheetloom. To do this click on the Upload Field Map icon in the Field Map column and upload the new file. When the field map is opened field headers appear in blue, values in yellow.


2.2.2 Add an Expression Type

A regex expression is used to identify and isolate specific text within a larger body of selected text, or when the field header and field value are on the same line. The text will be the field value in the generated CSV file.

Select the range to include the field value.

Click Add Entry to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Expression

In the Expression box, edit or provide the regular expression (regex) to extract the required field value.

The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the mapping document and generated CSV, so it should be edited to give a short and meaningful name.

map-doc-fieldselect

The expression in the Expression box will capture any text located where "Avenue J.F. Kennedy" is positioned, disregarding the rest in yellow. The full expression is: (?:Numero\s+Rue\s+\d+\s+)(.*)

The field name in the Heading has been edited to reflect the fact that only Rue, the "Street name", is needed.

Click Add Entry to apply and save the changes.

Continue to add additional fields using one of the three entry types as described. When all fields have been added, click Download Field Map. The map can now be uploaded to Sheetloom. To do this click on the Upload Field Map icon in the Field Map column and upload the new file.

2.2.3 Add a Boolean Expression Type

The value returned by the Boolean Expression indicates if the text was found. A "true" value indicates that it was; a false that it was not. true or false will be the field values in the generated CSV file. These are system default values and can be edited in the mapping document that is created.

Select the range to include the field value.

Click Add Entry to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Boolean Expression

In the Expression box edit or provide the regular expression (regex) that will extract the required field value. As Boolean Expression has been selected, the output will be true or false depending on whether the value is found.

The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the field map document and generated CSV and can be edited to a short and meaningful name.

Continue to add additional fields using one of the three entry types as described. When all fields have been added, click Download Field Map. The map can now be uploaded to Sheetloom. To do this click on the Upload Field Map icon in the Field Map column and upload the new file.

3. Working With Field Maps

3.1 View Field List

The list of selected field names are displayed across the top of the mapping document. This updates dynamically as fields are added, edited, or removed.

mapping document fields list

3.2 Edit a Field Map


caution

Editing a field map should only be done if the CSV has not yet been generated: changes in the columnular structure will not be passed to the final database table when the CSV is regenerated. This requires that a database table be created again.


A new map document is created from the changes made, saved, downloaded, and uploaded to replace the existing map document.

3.2.1 Edit Existing Entries

Navigate to the field map by clicking the eye icon on the TXT files list. Hover over the field to be edited to launch the configuration panel. Make the required changes and click the Save Changes button. Click the Add Entry button to add the changed entry and download the new field map. The new field map can now replace the current field map. When this is done the changes will become effective.

3.2.2. Add a New Entry

Navigate to the field map by clicking the eye icon on the TXT files list. In the map file, add new fields as described in 2.2 Add a Field to the Field Map The new field map can now replace the current field map. When this is done the changes will become effective.

3.2.3 Delete an Entry

Navigate to the field map by clicking the eye icon on the TXT files list. Hover over the field to be deleted to launch the configuration panel. Click the ´delete´item. The field will turn red. Click the Add Entry button to add the changed entry and download the new field map. The new field map can now replace the current field map. When this is done the removed fields will not be included.

danger

To ensure changes are saved to a new map document, (i) the Add Entry button must be clicked after the edit has been saved, and (ii) the field map must have been downloaded before the text file is closed.