Field Map Document
The field map is a document in YAML format containing a list of user defined fields appearing in the source PDF. Sheetloom uses the map to extract fields for a CSV file it builds.
It can be created externally in advance and uploaded to Sheetloom, or created from the text document generated during the conversion process. The following describes both methods.
Each field is configured to work with either a heading, regular expression, or boolean expression.
Knowledge of YAML and regular expressions are prerequisites.
Entry Type | Description |
---|---|
Heading | Used when the field value is in a precise position in all PDF documents, otherwise an Expression is used. |
Expression | Used when field value text is within a larger body of selected text, or when the target text is not or may not be in a precise position in PDF documents. |
Boolean Expression | Used to indicate if a field value exists or not (true or false by default). |
1.Create an External Field Map
1.1 Identify Fields
Identify fields from the PDF that are required in the final CSV table, and the appropriate field entry type for each of them.
1.2 Compile a Field Map
Each of the three possible field entry types requires a different set of fields in the YAML document.
1.2.1 Header Field Entry Type
Field | Description |
---|---|
Name | Field name for CSV table output. |
Type | Entry type =Header. |
Indent | Characters to count from left-hand side of the row the value is on. |
Length | Maximum number of characters that will be extracted from the value. |
startIndex | Number of characters from beginning of the PDF to the value starting position. |
1.2.2 Expression Field Entry Type
Field | Description |
---|---|
Name | Field name for CSV table output. |
Type | Entry type = Expression. |
Ordinal | The instance of the expression to return if more than one is found. Defaults to first if blank. |
Expression | The regular expression. |
1.2.3 Boolean Expression Field Entry Type
Field | Description |
---|---|
Name | Field name for CSV table output. |
Type | Entry type = Boolean Expression. |
Ordinal | The instance of the expression to return if more than one is found. Defaults to first if blank. |
Expression | The regular expression. |
true_value | The value to return if the expression is present in the text. |
false_value | The value to return if the expression is not present in the text. |
Once completed the field map can be uploaded to Sheetloom.
2 Create a Field Map in Sheetloom
This is done in the text preview after the text files have been extracted from uploaded PDFs. A user creates a field map by manually defining the fields.
2.1 Open Text File
In the PDFs page click on the txt
icon in the TXT column to view file or list of files if more than one PDF has been uploaded. Select the eye
icon on one of the files to open text preview. Fields can now be added directly.
2.2 Add a Field to the Field Map.
A field is identified using one of three different entry types depending on the way the text is laid out: Heading, Expression (regex), or Boolean Expressions.
2.2.1 Add a Heading Type
Used when the field name is on one line and its value is on the next line, with no other text preceding the value.
Select the range to include the header name, with the field value on the following line. If other text precedes the field value the Expression type must be used.
Click Add Entry
to launch the field configuration panel. Type=Heading is shown as default. The full text from the first line is displayed in the Heading box. This entry will be the field name in the field map and generated CSV, and can be edited to a short and meaningful name.
The Expression box can be disregarded.
The graphic shows a Heading name that has been changed to Durée from Durée Date de fin. The field value is on the following line with no other text preceding it, therefore the Type = Heading can be used.
Click Add Entry
again to apply and save the changes.
Continue to add additional fields using one of the three entry types as described. When all fields have been added, click Download Field Map
. The map can now be uploaded to Sheetloom. To do this click on the Upload Field Map
icon in the Field Map column and upload the new file. When the field map is opened field headers appear in blue, values in yellow.
2.2.2 Add an Expression Type
A regex expression is used to identify and isolate specific text within a larger body of selected text, or when the field header and field value are on the same line. The text will be the field value in the generated CSV file.
Select the range to include the field value.
Click Add Entry
to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Expression
In the Expression box, edit or provide the regular expression (regex) to extract the required field value.
The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the mapping document and generated CSV, so it should be edited to give a short and meaningful name.
The expression in the Expression box will capture any text located where "Avenue J.F. Kennedy" is positioned, disregarding the rest in yellow. The full expression is: (?:Numero\s+Rue\s+\d+\s+)(.*)
The field name in the Heading has been edited to reflect the fact that only Rue, the "Street name", is needed.
Click Add Entry
to apply and save the changes.
Continue to add additional fields using one of the three entry types as described. When all fields have been added, click Download Field Map
. The map can now be uploaded to Sheetloom. To do this click on the Upload Field Map
icon in the Field Map column and upload the new file.
2.2.3 Add a Boolean Expression Type
The value returned by the Boolean Expression indicates if the text was found. A "true" value indicates that it was; a false that it was not. true or false will be the field values in the generated CSV file. These are system default values and can be edited in the mapping document that is created.
Select the range to include the field value.
Click Add Entry
to launch the field configuration panel. Type=Heading is shown as default. Change it to Type=Boolean Expression
In the Expression box edit or provide the regular expression (regex) that will extract the required field value. As Boolean Expression has been selected, the output will be true or false depending on whether the value is found.
The full text from the first line selected is displayed in the Heading box. This entry will be the field name in the field map document and generated CSV and can be edited to a short and meaningful name.
Continue to add additional fields using one of the three entry types as described. When all fields have been added, click Download Field Map
. The map can now be uploaded to Sheetloom. To do this click on the Upload Field Map
icon in the Field Map column and upload the new file.
3. Working With Field Maps
3.1 View Field List
The list of selected field names are displayed across the top of the mapping document. This updates dynamically as fields are added, edited, or removed.
3.2 Edit a Field Map
Editing a field map should only be done if the CSV has not yet been generated: changes in the columnular structure will not be passed to the final database table when the CSV is regenerated. This requires that a database table be created again.
A new map document is created from the changes made, saved, downloaded, and uploaded to replace the existing map document.
3.2.1 Edit Existing Entries
Navigate to the field map by clicking the eye
icon on the TXT files list. Hover over the field to be edited to launch the configuration panel. Make the required changes and click the Save Changes
button. Click the Add Entry
button to add the changed entry and download the new field map. The new field map can now replace the current field map. When this is done the changes will become effective.
3.2.2. Add a New Entry
Navigate to the field map by clicking the eye
icon on the TXT files list. In the map file, add new fields as described in 2.2 Add a Field to the Field Map The new field map can now replace the current field map. When this is done the changes will become effective.
3.2.3 Delete an Entry
Navigate to the field map by clicking the eye
icon on the TXT files list. Hover over the field to be deleted to launch the configuration panel. Click the ´delete´item. The field will turn red. Click the Add Entry
button to add the changed entry and download the new field map. The new field map can now replace the current field map. When this is done the removed fields will not be included.
To ensure changes are saved to a new map document, (i) the Add Entry
button must be clicked after the edit has been saved, and (ii) the field map must have been downloaded
before the text file is closed
.