Multi-Page PDF with Distinct Layout Using Puppeteer

CodeStax.Ai
7 min readMay 14, 2024

--

These days, practically every company seeks to offer data in PDF format, whether it your bank statement or order details. All people do is share information through PDFs, which you can view on your devices and print to keep on file. Given the widespread use of PDFs, all developers ought to experiment with PDF producing libraries, including pdfmake, PDFKit, Puppeteer, and so on.

In this tutorial, I will generate the PDF locally using Puppeteer-Core subsequently, we will talk about an entire architecture to automate PDF generation with AWS services. As you may have observed by now, I’ve talked about Puppeteer-Core rather than Puppeteer, so let’s start by examining their distinctions.

What is Puppeteer and how its different from Puppeteer-Core ?

Puppeteer is a Node.js library that provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default but can be configured to run in full (“headful”) Chrome/Chromium. You’re on the correct road if, after reading this definition, you’re thinking it’s a browser control library akin to web automation. It will produce the PDF exactly how you would manually with an HTML page in your browser.

Initial Setup and Requirement

  1. Linux OS (Ubuntu)
  2. Node.Js (18.17.0)
  3. Puppeteer-Core (21.5)
  4. @sparticuz/chromium (119.0.2)

After successfully installing these requirements, your package.json will have these entries.

"dependencies": { 
"@sparticuz/chromium": "^119.0.2",
"puppeteer-core": "21.5"
}

We can now get on with the coding as our initial setup is over.

const puppeteer = require("puppeteer-core");
const chromium = require("@sparticuz/chromium");
const fs = require('fs');
// Optional: If you'd like to use the new headless mode. "shell" is the default.
chromium.setHeadlessMode = true;
// Optional: If you'd like to disable webgl, true is the default.
chromium.setGraphicsMode = false;
// loading html file content
let htmlContent;
fs.readFile('./test.html', 'utf8', (err, data) => {
htmlContent = data;
});
async function generatePdf() {
let pdfBuffer;
const browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath(),
headless: chromium.headless,
});
console.log(browser);
const page = await browser.newPage();
const loaded = page.waitForNavigation({
waitUntil: 'load'
});
await page.setContent(htmlContent);
await loaded;
pdfBuffer = await page.pdf({ format: 'A4' });
console.log(`[INFO] Pdf Is Generated Successfully`);
// Write the PDF buffer to the specified file path
fs.writeFileSync('testFs.pdf', pdfBuffer);
console.log('[INFO] SSPDF file saved successfully:', filePath);
await browser.close();
}
generatePdf();

Though you can have your own HTML file with content, I’m using test.html here. I’m also handling files with the fs module.

This is the PDF I produced, and each page has a different layout. You can write your page content inside the `div` tag and utilize the CSS `page-break` property to ensure the page break.

<div style="page-break-after: always;"> <!-- Your page Content --> </div>

Now that we have finished the static content portion of the PDF generating process, genuine PDFs will require dynamic content that varies periodically for various users.

Regular expressions, or RegExp, are one potential way to solve this issue in our HTML template. Then, we can utilize RegExp and the `replace` method in our JavaScript file to swap out certain strings for dynamic values.

This code is a component of the HTML code, where the RegExp is placeholderd with `${Receipt Number}.

<td>
<strong>Receipt Number : </strong> <span>${Receipt Number}</span>
</td>

The JavaScript `replace` method will be used to replace this RegExp with dynamic data.

let pdfData = {'Receipt Number' : "10/04/2024"}
let htmlContent = htmlString.replace(/\${([^}]+)}/g, (match, key) => pdfData[key.trim()]);

After performing this replace operation, `${Receipt Number}` will be replaced with `10/04/2024` in the HTML code, which will later be passed to Puppeteer for PDF generation. Now, let’s discuss the automation of PDF generation.

How to Automate PDF Generation Using AWS Services

Automation of PDF generation requires some AWS services such as:

  1. AWS Lambda
  2. AWS S3 Bucket

We associate this Lambda with every event that takes place in our application, such as database insertion. We can now be guaranteed that this Lambda will be executed on a certain event after adding it as a trigger. The S3 bucket is the following section. The HTML template, which will be sent to the browser to create the PDF, will be stored in an S3 bucket.

Pdf Automation Design

The processes involved in creating a PDF can be seen by looking at the Pdf Automation Design materials.

For presentation purposes, I am using AWS Lambda, which gets triggered on DynamoDB operations, and S3 buckets to fetch the template and store the newly created PDF. Let’s look into how we set up this whole architecture in our project:

  1. Make HTML Template and Save it to Template-bucket on S3.
  2. Make a Lambda Method that mostly does four things.
    a. Retrieve the template out from the template-bucket S3 bucket.
    b. Apply the replace function to replace event data obtained in the Lambda method’s parameter for the RegExp string found in our template.
    c. Create the PDF with altered values by using Puppeteer with the template.
    d. Put the produced PDF into an other S3 bucket (pdf-bucket).
  3. On DynamoDB Operations, Trigger Lambda Add a Lambda function as a trigger for every operation on our DynamoDB table.
const { S3Client, PutObjectCommand, GetObjectCommand } = require('@aws-sdk/client-s3');
const puppeteer = require('puppeteer-core');
const chromium = require('@sparticuz/chromium');
const { unmarshall } = require('@aws-sdk/util-dynamodb');
async function generatePdf(event) {
const functionName = 'generatePdf Lambda';
try {
let completeEvent = event;
let eventName = completeEvent.Records[0].eventName;
// Checking If Stream Is Of Insert Event Type.
if (eventName != 'INSERT') {
throw new Error(`[Error] [${functionName}] This Stream Is Only Made For Insert Event Type`);
}
let dynamoNewImageData = unmarshall(completeEvent.Records[0].dynamodb.NewImage);
let pdfData = dynamoNewImageData.documentData;
const fileUploadLocation = "output.pdf";
const templateFileName = "test.html";
// This Will Store The Html Template For Pdf Generation
let htmlContent;
// This Will Store The Pdf Data After Generating Pdf
let pdfBuffer;
// setting up s3clint Instance
let s3Client = new S3Client({
credentials: // your credentials,
region: //your region,
});
// Defining htmlTemplete by going throught html String
try {
let htmlParams = {
Bucket: //your bucket name,
Key: `${templateFileName}`
};
const response = await s3Client.send(new GetObjectCommand(htmlParams));
let htmlString = await response.Body.transformToString();
// Defining HtmlContent By Replacing The ${Data} present in HtmlString
htmlContent = htmlString.replace(/\${([^}]+)}/g, (match, key) =>
(pdfData[key.trim()] !== undefined ? pdfData[key.trim()] : 'NA'));
console.log(`[INFO] [${functionName}] Template Is Loaded And Populated With Data`);
} catch (err) {
console.log(`[Error] [${functionName}] Facing Error while Setting Template : ${err}`);
throw new Error(`Facing Error while Setting Template : ${err}`);
}
// Generating pdf data and storing it in variable
try {
const browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
headless: chromium.headless,
executablePath: await chromium.executablePath()
// if you are using windows for testing provide the chrome.exe path.
// executablePath: 'C:/Program Files/Google/Chrome/Application/chrome.exe'
});
const page = await browser.newPage();
const loaded = page.waitForNavigation({
waitUntil: 'load'
});
await page.setContent(htmlContent);
await loaded;
pdfBuffer = await page.pdf({ format: 'A4' });
await browser.close();
console.log(`[INFO] [${functionName}] Pdf Is Generated Successfully`);
} catch (err) {
console.log(`[Error] [${functionName}] Failed In Pdf Generation`);
throw new Error(`Facing Error while generating pdf with puppeteer : ${err}`);
}
// Uploading PDF to S3 bucket.
try {
const uploadParams = {
Bucket: // your bucket name to store pdf,
Key: fileUploadLocation,
Body: pdfBuffer,
ContentType: 'application/pdf'
};
const objectInsertResponse = await s3Client.send(new PutObjectCommand(uploadParams));
console.log(`File Uploaded Successfully At ${fileUploadLocation}`);
if (objectInsertResponse.$metadata.httpStatusCode == '200') {
console.log(`[INFO] [${functionName}] [SUCCESS] Pdf Uploaded httpstatus code 200`);
} else {
console.log(`[INFO] [${functionName}] [FAILED] Pdf Not Uploaded Httpstatus Code 400`);
}
} catch (err) {
throw new Error(`Facing Error while Uploading Pdf To the Cloud : ${err}`);
}
} catch (error) {
console.log(`[ERROR] [${functionName}] Api Main Error`, error);
return { 'Error': `[ERROR] [${functionName}] Api Main Error` };
}
}

We have come to the end of this article. I hope you learned something new, just like I did when I had a requirement to generate a PDF with multiple pages and different layouts. Discovering Puppeteer fulfilled my requirement, and I’m glad to share this knowledge with you. Thank you for your time. If you have any suggestions, please feel free to provide them in the comments section.

About the Author:

Akarshit Gupta is a Software Development Engineer (SDE1) at Codestax.ai, With a passion for technology, he enjoys sharing his insights on innovation, productivity and personal development.

About CodeStax.Ai

At CodeStax.Ai, we stand at the nexus of innovation and enterprise solutions, offering technology partnerships that empower businesses to drive efficiency, innovation, and growth, harnessing the transformative power of no-code platforms and advanced AI integrations.

But the real magic? It’s our tech tribe behind the scenes. If you’ve got a knack for innovation and a passion for redefining the norm, we’ve got the perfect tech playground for you. CodeStax.Ai offers more than a job — it’s a journey into the very heart of what’s next. Join us, and be part of the revolution that’s redefining the enterprise tech landscape.

--

--

CodeStax.Ai
CodeStax.Ai

Written by CodeStax.Ai

Tech tales from our powerhouse Software Engineering team!

No responses yet