Convert PDF to CSV
The following explains the stages for converting PDF files to CSV format.
Upload a PDF
From the PDFs page click Upload PDF
and select a file. To change the name of the folder the PDF file is stored in, click Show Advanced Options
and input the name.
With Advanced Options selected the folder name can be changed from the default PDF name. In the example we change from b000001 (the company registration number) to Registration .
The Folder now displays in the PDFs list.
Consider changing the folder name to describe the content of the PDF being uploaded. For example if the PDF is named as the company, and contains registration details, changing name to Registration makes sense as other company PDFs will be appended later.
Append PDFs
Additional PDF files may be available to append immediately, or may not become available until later.
From the PDF page click on the PDF folder then click Append PDF
. Click Choose PDFs
and select the file or files (there is no limit) to append, and click Open
.
The files display in the dialogue box. Remove any if required then select Upload
The uploaded files are added to the folder and the table is appended.
Extract Text Fields from PDFs
This is the next step in producing a CSV file.
Uploaded PDFs are in binary format and text needs to be extracted from them. This is done with a pre-produced mapping document that extracts the text from the PDF documents and identifies where the fields are.
From the PDF page click the Upload
icon in the Field Map column. Select the mapping document and upload it.
Click the Play
button in the TXT column. The mapping document runs and extracts all the text from all the PDFs in the folder.
Once complete the TXT
icon becomes live. Click on it to see a list of text documents that have been generated from the uploaded PDFs.
To preview the extracted text contents from any of the text documents click the eye
icon.
All text has been extracted. Field names specified in the mapping document are shown in blue; field contents in yellow.
Generate a CSV from Text Files
Now that text files are generated, a CSV file can be created from them.
On the PDF page click on the Play
button in the CSV column.
In a few seconds the CSV files are created. The Table
icon in the Table column becomes live. Click on it to generate a preview of the CSV showing the extracted columns from each of the PDF documents.
The data is stored in a database table in Sheetloom called pdf-tablename e.g., pdf_registration and can be referenced in a Stitch query.
Parameters can be used in the normal way to filter results, for example a parameter called registration number can be created in Sheetloom to filter on a particular company during weaving.