Skip to main content

Convert PDF to CSV

The following explains the stages for converting PDF files to CSV format.

Upload a PDF

From the PDFs page click Upload PDF and select a file. To change the name of the folder the PDF file is stored in, click Show Advanced Options and input the name.

upload pdf


With Advanced Options selected the folder name can be changed from the default PDF name. In the example we change from b000001 (the company registration number) to Registration .

upload pdf advanced


The Folder now displays in the PDFs list.

pdf page


info

Consider changing the folder name to describe the content of the PDF being uploaded. For example if the PDF is named as the company, and contains registration details, changing name to Registration makes sense as other company PDFs will be appended later.


Append PDFs

Additional PDF files may be available to append immediately, or may not become available until later.

From the PDF page click on the PDF folder then click Append PDF. Click Choose PDFsand select the file or files (there is no limit) to append, and click Open.

append files


The files display in the dialogue box. Remove any if required then select Upload

pdf confirm append upload


The uploaded files are added to the folder and the table is appended.

pdf-append-in-folder


Extract Text Fields from PDFs

This is the next step in producing a CSV file.

Uploaded PDFs are in binary format and text needs to be extracted from them. This is done with a pre-produced mapping document that extracts the text from the PDF documents and identifies where the fields are.

From the PDF page click the Uploadicon in the Field Map column. Select the mapping document and upload it.

pdf-append-in-folder

Click the Play button in the TXT column. The mapping document runs and extracts all the text from all the PDFs in the folder.

Once complete the TXT icon becomes live. Click on it to see a list of text documents that have been generated from the uploaded PDFs.

pdf text docs


To preview the extracted text contents from any of the text documents click the eyeicon.

pdf extract preview

All text has been extracted. Field names specified in the mapping document are shown in blue; field contents in yellow.


Generate a CSV from Text Files

Now that text files are generated, a CSV file can be created from them.

On the PDF page click on the Play button in the CSV column. pdf generate csv


In a few seconds the CSV files are created. The Table icon in the Table column becomes live. Click on it to generate a preview of the CSV showing the extracted columns from each of the PDF documents.

csv-preview


The data is stored in a database table in Sheetloom called pdf-tablename e.g., pdf_registration and can be referenced in a Stitch query.

Parameters can be used in the normal way to filter results, for example a parameter called registration number can be created in Sheetloom to filter on a particular company during weaving.