#ValueLeap with PDF Reader Utility – Stratus Driving Innovation with Automation

By Bhanu Regulagedda & Garima Goel

What are PDF documents?

PDF documents are small-sized, extremely self-assured files. Almost all industries use PDFs for processing their files. The reason being widely used because of the distinctive feature of preserving format nonetheless of the tool used to access PDF files. In our day-to-day life all our invoices, official documents, contractual documents, boarding pass, bank statements, etc. are usually in PDF format.

In this blog, we will delve into Selenium testing of PDF files and how we have designed a solution to handle a PDF document using test automation.


Problem Statement:

With any Insurance Product after successful transactions in the system, we get legal and policy documents generated which are downloaded and verified with a base template along with the manual authentication of the dynamic content like policy number, account number, insured name, address of the policyholder, coverages & premiums etc.

This brings in heavy human work for the Business Users supporting the System including Underwriters, Adjusters etc.


Implemented Solution:

We have designed a solution using PDF Box jars. Before we start, we would take a base template of the document and place it in project resources folder. In our selenium automation test scripts, we would perform related transactions and download the document PDF to a particular location in the machine.

Once we have both the PDF’s we first take the base template and load it to the code using Java/Python. Then we use string libraries of the programming language used in the project to split the content based on the regular expressions to differentiate the dynamic and static content and store static content to a variable and all dynamic data will not be stored but the location of the dynamic data will be identified and stored.

We repeat the same procedure for the document downloaded from <X–Center> after successful transaction. We will store the static content to a variable and all other dynamic content location is stored in the variables.

Now we have both the sample and actual data and using the java/python assert libraries we would assert for the variables for the correctness of the static content and the position of the dynamic content.


Understanding Proposed Architecture:


#ValueLeap – Business Value Delivered

  • PDFs/User documents must always be incorporated with accurate details, and it must be ensured that the information provided is verified. Our PDF Reader Utility helps provide seamless automated verification of the documents generated.
  • Validating and verifying the documents could be easy when done manually but it poses a major time-related challenge when validating multiple documents in a day. Automating this validation process using our PDF Reader Utility will help in reducing the time challenge for the Business Users and the saved time can be utilized with other tasks bringing efficiencies and higher productivity levels with the team.
  • Using PDF Reader Utility will reduce human error(s) during documents authentication ensuring accuracy and positive policyholder feedback along with regulatory compliance.

Meet the Authors

Bhanu Regulagedda

Garima Goel

Associate SVP, Program Delivery, Stratus Global Technology Services