Skip to main content

PDF Guide

Sheetloom provides a feature to extract standing data from a PDF, and convert it to a CSV table ready to be used as a data source.

It is particularly useful for a business with repeating or multiple PDF files containing the same data. Automating the process saves time, reduces manual data processing errors, and ensures consistency.

Examples of the efficiencies that can be achieved are:

  • multiple company registration PDFs that contain data that an accountancy firm extracts for inclusion in its clients' official financial statements,
  • official company announcement PDFs containing information and data that investment analysts need to extract and analyze rapidly.

Fields from a PDF are identified and a mapping document produced. These documents are uploaded to Sheetloom.

At runtime Sheetloom scans the mapping document, maps it into the PDF to locate the fields, and outputs the results to a text document. This in turn is converted to a CSV and database table.

Initially one PDF is processed, and a database table created from it. Thereafter, multiple PDFs can simultaneously be appended to the table.

The final database table is mapped to an Excel template in the same way as any table or data source in Sheetloom. The template can be parameterised to allow filtering, including on unique field values such as company name or reference number.

Read the PDF Guide to learn how to configure Sheetloom to convert PDFs.