tutorial // Jun 24, 2021
How to Generate a PDF in Node.js with Puppeteer and JavaScript
How to generate a PDF file and render it in the browser using Puppeteer and Express.
Getting started
For this tutorial, we're going to use the CheatCode Node.js Boilerplate to give us a starting point for our work. First, let's clone a copy of that to our computer:
Terminal
git clone https://github.com/cheatcode/nodejs-server-boilerplate.git server
Next, install the dependencies for the boilerplate:
Terminal
cd server && npm install
After that, we need to install the puppeteer
package from NPM which will help us generate our PDF:
Terminal
npm i puppeteer
Finally, start the development server up:
Terminal
npm run dev
After this, we have everything we need to do our work.
Creating a PDF generator function
Our first task is to write the function that we'll use to actually generate our PDF. This function will take in some HTML and CSS for the contents of our PDF and then output it as an actual PDF:
/lib/generatePDF.js
import puppeteer from "puppeteer";
export default (html = "") => {};
Here, we begin by importing the puppeteer
dependency we installed earlier. This is what we'll use to generate our PDF. Below that import, we create a skeleton for our generatePDF()
function, taking in a single argument html
as a string.
/lib/generatePDF.js
import puppeteer from "puppeteer";
export default async (html = "") => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(html);
};
Next, using the puppeteer
package we imported up top, we create an instance of a web browser with puppeteer.launch()
. Notice that here, we expect that function to return us a JavaScript Promise, so we add the await
keyword in front to say "wait for the Promise returned by this function to resolve before continuing with the rest of our code."
In order for this to work, too, we're adding an async
keyword just before our function definition up above. If we don't do this, JavaScript will throw a runtime error saying "await is a reserved keyword."
Once we have our Puppeteer browser
instance, next, we create a new page with browser.newPage()
. Though it may not look like it, this is like opening a tab in your web browser (Puppeteer is what's known as a "headless" browser, or, a web browser without a GUI or graphical user interface).
Again, we use the await
keyword here. This is because all of the functions we'll use from Puppeteer return a JavaScript promise. We want to await
these Promises because what we're doing is a synchronous process (meaning, we want to ensure each step in our code is complete before moving on to the next).
Finally, with our page
available, we set the content of the page—the HTML markup that makes up what we would see in the browser if it wasn't headless.
At this point, if we were to be using a browser with a GUI, we'd see whatever HTML/CSS we passed in rendered on screen.
/lib/generatePDF.js
import puppeteer from "puppeteer";
export default async (html = "") => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(html);
const pdfBuffer = await page.pdf();
await page.close();
await browser.close();
return pdfBuffer;
};
Building out the remainder of our function, now, we see how we go from rendering a page in the browser to getting a PDF. Here, we call to the Puppeteer page.pdf()
function. This is responsible for converting our HTML page into the PDF format.
Notice that we're calling this method on the page
variable we created above and set the content on. In essence, this is saying "convert this page into a PDF." To page.pdf()
, optionally, you can pass options to customize the look and feel of your PDF.
Though it may not look like much, this is all we need to do to get back our PDF file. You'll notice that we store the response to page.pdf()
in a variable called pdfBuffer
. This is because what we get in response is a file buffer which is the in-memory representation of our PDF (meaning, the contents of the file before it's turned into an actual file we'd have on our computer).
Before we return this file buffer from our function at the bottom, we make sure to call page.close()
and browser.close()
to clear out our Puppeteer instance in memory. This is very important because if you don't, after our PDF is generated Puppeteer will continue to take up memory. Meaning, every time someone calls this function, a new Puppeteer instance will be created in memory. Do that enough times and your server will run out of memory leading to a crash.
With that, our generatePDF()
function is complete. To finish out the tutorial, let's create an HTTP route on our server that we can use to call to our generatePDF()
function.
Wiring up a route to test our PDF generator
To test our PDF generation out, we're going to create an HTTP route using the Express server set up for us in the CheatCode Node.js Boilerplate we're building this app with. To make sure our wiring makes sense, real quick, let's look at how our Express server is set up and then where our code will live.
/index.js
import express from "express";
import startup from "./lib/startup";
import api from "./api/index";
import middleware from "./middleware/index";
import logger from "./lib/logger";
startup()
.then(() => {
const app = express();
const port = process.env.PORT || 5001;
middleware(app);
api(app);
app.listen(port, () => {
if (process.send) {
process.send(`Server running at http://localhost:${port}\n\n`);
}
});
process.on("message", (message) => {
console.log(message);
});
})
.catch((error) => {
logger.error(error);
});
From the root of the project, the index.js
file contains all of the code for starting up our Express server. Inside, the idea is that we have a startup()
method which is called before we set up our HTTP server (this sets up our event listeners for errors and, if we wish, anything else that needs to be loaded before our HTTP server starts).
In the .then()
callback for our startup()
method, we call to the familiar express()
function, receiving our app
instance in return. With this, we listen for connections on either the process.env.PORT
(typically set when deploying an app) or the default port 5001
.
Just above our call to app.listen()
we call two functions middleware()
and api()
which take in our app instance. These functions are used to separate our code for organization. We're going to write our test route for generating a PDF inside of the api()
function here.
Let's take a look at that function now:
/api/index.js
import generatePDF from "../lib/generatePDF";
import graphql from "./graphql/server";
export default (app) => {
graphql(app);
app.use("/pdf", (req, res) => {
// We'll call to generatePDF() here...
});
};
Taking in the app
instance we passed in from /index.js
, here, we set up the API for our server. By default, this boilerplate uses GraphQL for its main API, so here, we call to set up that GraphQL API via graphql()
, also passing in the app
instance. We won't use this for our work in this tutorial.
The part we care about is our call to app.use()
, passing in the /pdf
path where we expect our route to live. Our goal is to make it so that when we visit this route, we'll call generatePDF()
—passing in some HTML and CSS—and then return it to our route. The point being to render our PDF file in the browser (using the browser's built-in PDF viewer) so we can verify our function works and get access to a free download button.
/api/index.js
import generatePDF from "../lib/generatePDF";
import graphql from "./graphql/server";
export default (app) => {
graphql(app);
app.use("/pdf", async (req, res) => {
const pdf = await generatePDF(`
<html>
<head>
<title>Test PDF</title>
</head>
<body>
// The contents of our PDF will go here...
</body>
</html>
`);
res.set("Content-Type", "application/pdf");
res.send(pdf);
});
};
To achieve that, using the generatePDF()
function we wrote earlier and have imported up top, inside of the callback function for our Express route, we add the async
keyword like we learned about earlier and then call to generatePDF()
, passing in a string of HTML (we'll add to this next).
Recall that when we call to generatePDF()
, we expect to get our PDF back as a file buffer (an in-memory representation of our browser). What's neat about this is that, if we tell the inbound HTTP request the format—Content-Type
—of our response, it will handle the data we send back to it differently.
Here, we use the .set()
method on the HTTP res
ponse object, saying that "we want to set the Content-Type
header to application/pdf
." The application/pdf
part is what's known as a MIME type. A MIME type is a file/data type that's universally recognized by browsers. Using that type, we can tell our browser "the data we're sending back in response to your request is in the following format."
After that, all we need to do is call to the .send()
method on res
ponse, passing in our pdf
file buffer. The browser takes care of the rest!
Before we give this a test, let's flesh out our test HTML:
/api/index.js
import generatePDF from "../lib/generatePDF";
import graphql from "./graphql/server";
export default (app) => {
graphql(app);
app.use("/pdf", async (req, res) => {
const pdf = await generatePDF(`
<html>
<head>
<title>Test PDF</title>
<style>
body {
padding: 60px;
font-family: "Hevletica Neue", "Helvetica", "Arial", sans-serif;
font-size: 16px;
line-height: 24px;
}
body > h4 {
font-size: 24px;
line-height: 24px;
text-transform: uppercase;
margin-bottom: 60px;
}
body > header {
display: flex;
}
body > header > .address-block:nth-child(2) {
margin-left: 100px;
}
.address-block address {
font-style: normal;
}
.address-block > h5 {
font-size: 14px;
line-height: 14px;
margin: 0px 0px 15px;
text-transform: uppercase;
color: #aaa;
}
.table {
width: 100%;
margin-top: 60px;
}
.table table {
width: 100%;
border: 1px solid #eee;
border-collapse: collapse;
}
.table table tr th,
.table table tr td {
font-size: 15px;
padding: 10px;
border: 1px solid #eee;
border-collapse: collapse;
}
.table table tfoot tr td {
border-top: 3px solid #eee;
}
</style>
</head>
<body>
<h4>Invoice</h4>
<header>
<div class="address-block">
<h5>Recipient</h5>
<address>
Doug Funnie<br />
321 Customer St.<br />
Happy Place, FL 17641<br />
</address>
</div>
<div class="address-block">
<h5>Sender</h5>
<address>
Skeeter Valentine<br />
123 Business St.<br />
Fake Town, TN 37189<br />
</address>
</div>
</header>
<div class="table">
<table>
<thead>
<tr>
<th style="text-align:left;">Item Description</th>
<th>Price</th>
<th>Quantity</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">Swiss Army Cat</td>
<td style="text-align:center;">$32.70</td>
<td style="text-align:center;">x1</td>
<td style="text-align:center;">$32.70</td>
</tr>
<tr>
<td style="text-align:left;">Holeless Strainer</td>
<td style="text-align:center;">$9.00</td>
<td style="text-align:center;">x2</td>
<td style="text-align:center;">$18.00</td>
</tr>
<tr>
<td style="text-align:left;">"The Government Lies" T-Shirt</td>
<td style="text-align:center;">$20.00</td>
<td style="text-align:center;">x1</td>
<td style="text-align:center;">$20.00</td>
</tr>
</tbody>
<tfoot>
<tr>
<td colSpan="2" />
<td style="text-align:right;"><strong>Total</strong></td>
<td style="text-align:center;">$70.70</td>
</tr>
</tfoot>
</table>
</div>
</body>
</html>
`);
res.set("Content-Type", "application/pdf");
res.send(pdf);
});
};
In the <head></head>
tag of our HTML, we've added some CSS to style the markup we've added in our <body></body>
tag. Though the specifics are out of the scope of this tutorial, what this gets us is a simple invoice design (a common use case for PDF rendering):
If we visit http://localhost:5001/pdf
in our web browser, the built-in PDF reader should kick in and we should see our PDF rendered on screen. From here, we can use the download button in the top right to save a copy to our computer.
Wrapping up
In this tutorial, we learned how to convert HTML into a PDF using Puppeteer. We learned about creating a Puppeteer browser instance, opening a page on that instance, and setting the HTML content of that page. Next, we learned how to convert that HTML page into a PDF file buffer and then, once cached in a variable, close out the Puppeteer page and browser instance to save memory.
Finally, we learned how to take the PDF file buffer that we received from Puppeteer and render it in the browser using Express.