tutorial // Apr 14, 2021

How to Generate a Dynamic Sitemap with Next.js

How to generate a sitemap for your Next.js-based site or app dynamically to improve the discoverability of your site for search engines like Google and DuckDuckGo.

How to Generate a Dynamic Sitemap with Next.js

If you're building a site or app using Next.js that needs to be visible to search engines like Google, having a sitemap available is essential. A sitemap is a map of the URLs on your site and makes it easier for search engines to index your content, increasing the likelihood for ranking in search results.

In Next.js, because we rely on the built-in router to expose routes to the public, the easiest way to set up a sitemap is to create a special page component that modifies its response headers to signal to browsers that the content being returned is text/xml data (browsers and search engines anticipate our sitemap being returned as an XML file).

By doing this, we can leverage the usual data fetching and rendering conveniences of React and Next.js while simultaneously returning data in a format the browser expects.

To demonstrate how this works, we're going to utilize the CheatCode Next.js Boilerplate as a starting point. To get started, clone a copy from Github:

git clone https://github.com/cheatcode/nextjs-boilerplate.git

Next, cd into the cloned directory and install the boilerplate's dependencies via NPM:

cd nextjs-boilerplate && npm install

Finally, start the boilerplate with (from the project's root directory):

npm run dev

Once all of this is complete, we're ready to get started building out our sitemap component.

Creating a sitemap page component

First, in the /pages directory at the root of the project, create a new file (file, not a folder) called sitemap.xml.js. The reason we're choosing this name is that Next.js will automatically create a route in our app at /sitemap.xml which is the location where browsers and search engine crawlers expect our sitemap to live.

Next, inside of the file, let's start building out the component:

/pages/sitemap.xml.js

import React from "react";

const Sitemap = () => {};

export default Sitemap;

The first thing you'll notice is that this component is just an empty function component (meaning we're not rendering any markup when the component is rendered by React). This is because, technically speaking, we don't want to render a component at this URL. Instead, we want to hijack the getServerSideProps method (this is called by Next.js as it receives an inbound request on the server) to say "instead of fetching some data and mapping it to the props for our component, override the res object (our response) and instead return the contents of our sitemap."

That's likely confusing. Fleshing this out a bit more, let's add a rough version of the res overrides we need to do:

/pages/sitemap.xml.js

import React from "react";

const Sitemap = () => {};

export const getServerSideProps = ({ res }) => {
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      <!-- We'll render the URLs for our sitemap here. -->
    </urlset>
  `;

  res.setHeader("Content-Type", "text/xml");
  res.write(sitemap);
  res.end();

  return {
    props: {},
  };
};

export default Sitemap;

This should make the "override" concept more concrete. Now, we can see that instead of returning an object of props from getServerSideProps, we're manually calling to set the Content-Type header of the response, write the response body, and end the request (signaling that the response should be sent back to the original request).

Here, we've spec'd out the basic template for a sitemap. Like we hinted at above, a sitemap is expected to be in an XML data format (or, text/xml MIME type). Next, when we fetch our data, we'll populate the <urlset></urlset> tag with <url></url> tags. Each tag will represent one of the pages in our site and provide the URL for that page.

At the bottom of the getInitialProps function, we handle our response to the inbound request.

First, we set the Content-Type header on the response to signal back to the browser that we're returning an .xml file. This works because the Content-Type sets expectations of what the browser needs to render and the sitemap.xml part of our sitemap.xml.js file's name is what Next.js uses for the URL of the page. So, if we called our page pizza.json.js, the URL generated by Next.js would be something like http://mydomain.com/pizza.json (in this case, we'll get http://mydomain.com/sitemap.xml).

Next, we call to res.write(), passing the generated sitemap string. This will represent the response body that the browser (or search engine crawler) receives. After, we signal back that "we've sent all we can send" to the request with res.end().

To meet the requirements of the getServerSideProps function (per Next.js' rules), we return an empty object with a props property set to an empty object—to be clear, if we don't do this, Next.js will throw an error.

Fetching data for your sitemap

Now for the fun part. Next, we need to get all of the content on our site that we want to represent in our sitemap. Typically this is everything, but you may have certain pages that you want to exclude.

When it comes to what content we're fetching to return in our sitemap, there are two types:

  1. Static pages - Pages that are located at a fixed URL in your site/app. For example, http://mydomain.com/about.
  2. Dynamic pages - Pages that are located at a variable URL in your site/app, like a blog post or some other dynamic content. For example, http://mydomain.com/posts/slug-of-my-post.

Retrieving this data is done in a couple of ways. First, for static pages, we can list out the contents of our /pages directory (filtering out the items we want to ignore). For dynamic pages, a similar approach can be taken, fetching data from a REST API or GraphQL API.

To start, let's look at fetching a list of the static pages in our app and how to add some filtering to trim down what we want:

/pages/sitemap.xml.js

import React from "react";
import fs from "fs";

const Sitemap = () => {};

export const getServerSideProps = ({ res }) => {
  const baseUrl = {
    development: "http://localhost:5000",
    production: "https://mydomain.com",
  }[process.env.NODE_ENV];

  const staticPages = fs
    .readdirSync("pages")
    .filter((staticPage) => {
      return ![
        "_app.js",
        "_document.js",
        "_error.js",
        "sitemap.xml.js",
      ].includes(staticPage);
    })
    .map((staticPagePath) => {
      return `${baseUrl}/${staticPagePath}`;
    });

  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      ${staticPages
        .map((url) => {
          return `
            <url>
              <loc>${url}</loc>
              <lastmod>${new Date().toISOString()}</lastmod>
              <changefreq>monthly</changefreq>
              <priority>1.0</priority>
            </url>
          `;
        })
        .join("")}
    </urlset>
  `;

  res.setHeader("Content-Type", "text/xml");
  res.write(sitemap);
  res.end();

  return {
    props: {},
  };
};

export default Sitemap;

We've added three big things here:

First, We've added a new baseUrl value at the top of our getServerSideProps function which will allow us to set the base of each URL that we render in our sitemap. This is necessary because our sitemap must include absolute paths.

Second, we've added a call to the fs.readdirSync() function (with fs imported at the top of the file) which is the synchronus read directory method that's built into Node.js. This allows us to get the file list of a directory at the path we pass (here, we specify the pages directory because we want to get all of our static pages).

Once fetched, we make a point to call .filter() on the array that we expect to get back, filtering out the utility pages in our site (including sitemap.xml.js itself) that we do not want present in our sitemap. After this we map over each of the valid pages and concatenate their path with the baseUrl we determined based on our current NODE_ENV up top.

If we were to console.log(staticPages), the end result of this should look something like this:

[
  'http://localhost:5000/documents',
  'http://localhost:5000/login',
  'http://localhost:5000/recover-password',
  'http://localhost:5000/reset-password',
  'http://localhost:5000/signup'
]

Third, focusing back on our sitemap variable where we store our sitemap as a string (before passing to res.write()), we can see we've modified this to perform a .map() over our staticPages array, returning a string containing the necessary markup for adding a URL to our sitemap:

/pages/sitemap.xml.js

${staticPages
  .map((url) => {
    return `
      <url>
        <loc>${url}</loc>
        <lastmod>${new Date().toISOString()}</lastmod>
        <changefreq>monthly</changefreq>
        <priority>1.0</priority>
      </url>
    `;
  })
  .join("")}

In terms of what we're returning, here we return the XML content expected by a web browser (or search engine crawler) when reading a sitemap. For each URL in our site that we want to add to our map, we add the <url></url> tag, placing a <loc></loc> tag inside that specifies the location of our URL, the <lastmod></lastmod> tag that specifies when the content at the URL was last updated, the <changefreq></changefreq> tag that specifies how frequently the content at the URL is updated, and a <priority></priority> tag to specify the importance of the URL (which translates to how frequently a crawler should crawl that page).

Here, we pass our url to <loc></loc> and then set our <lastmod></lastmod> to the current date as an ISO-8601 string (a standard type of computer/human-readable date format). If you have a date available for when these pages were last updated, it's best to be as accurate as possible with this date and pass that specific date here.

For <changefreq></changefreq>, we're setting a sensible default of monthly, but this can be any of the following:

  • never
  • yearly,
  • monthly
  • weekly
  • daily
  • hourly
  • always

Similar to the <lastmod></lastmod> tag, you will want this to be as accurate as possible to avoid any issues with a search engines' rules.

Finally, for <priority></priority>, we set a base of 1.0 (the maximum level of importance). If you wish to change this to be more specific, this number can be anything between 0.0 and 1.0 with 0.0 being unimportant, 1.0 being most important.

Though it may not look like much now, technically, if we visit http://localhost:5000/sitemap.xml in our browser (assuming you're working with the CheatCode Next.js Boilerplate and started the dev server earlier), we should see a sitemap containing our static pages!

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://localhost:5000/documents</loc>
    <lastmod>2021-04-14T01:36:47.469Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/login</loc>
    <lastmod>2021-04-14T01:36:47.469Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/recover-password</loc>
    <lastmod>2021-04-14T01:36:47.469Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/reset-password</loc>
    <lastmod>2021-04-14T01:36:47.469Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/signup</loc>
    <lastmod>2021-04-14T01:36:47.469Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>

Next, let's take a look at expanding our sitemap by fetching our dynamic pages using GraphQL.

Generating dynamic data for our sitemap

Because we're using the CheatCode Next.js Boilerplate for our example, we already have the wiring necessary for a GraphQL client. To contextualize our work, we're going to use this feature in conjunction with the CheatCode Node.js Boilerplate which includes an example database using MongoDB, a fully-implemented GraphQL server, and an example Documents collection that we can use to pull test data from.

First, let's clone a copy of the Node.js Boilerplate and get it set up:

git clone https://github.com/cheatcode/nodejs-server-boilerplate.git

And then cd into the cloned project and install all dependencies:

cd nodejs-server-boilerplate && npm install

Finally, go ahead and run the server with (from the root of the project):

npm run dev

If you go ahead and open up the project, we're going to add a little bit of code to seed the database with some documents so we actually have something to fetch for our sitemap:

/api/fixtures/documents.js

import _ from "lodash";
import generateId from "../../lib/generateId";
import Documents from "../documents";
import Users from "../users";

export default async () => {
  let i = 0;

  const testUser = await Users.findOne();
  const existingDocuments = await Documents.find().count();

  if (existingDocuments < 100) {
    while (i < 100) {
      const title = `Document #${i + 1}`;

      await Documents.insertOne({
        _id: generateId(),
        title,
        userId: testUser?._id,
        content: "Test content.",
        createdAt: new Date().toISOString(),
        updatedAt: new Date().toISOString(),
      });

      i += 1;
    }
  }
};

First, we need to create a file to hold a fixture (a nickname for code that generates test data for us) that will generate our test documents for us. To do it, we export a function that does a few things:

  1. Retrieves a test user (created by the included /api/fixtures/users.js fixture included with the boilerplate).
  2. Retrieves the existing .count() of documents in the database.
  3. Runs a while loop to say "while the number of existingDocuments in the database is less than 100, insert a document."

For the contents of the document, we generate a title that utilizes the current i iteration of the loop plus one to generate a different title for each generated document. Next, we call to the Documents.insertOne() function, provided by our import of the Documents collection (already implemented in the boilerplate) to .insertOne() document.

That document includes an _id set to a hex string using the included generateId() function in the boilerplate. Next, we set the title, followed by the userId set to the _id of the testUser we retrieved and then we set some dummy content along with a createdAt and updatedAt timestamp for good measure (these we'll come into play in our sitemap next).

/api/index.js

import graphql from "./graphql/server";
import usersFixture from "./fixtures/users";
import documentsFixture from "./fixtures/documents";

export default async (app) => {
  graphql(app);
  await usersFixture();
  await documentsFixture();
};

To make all of this work, we need to pull the included users fixture and our new documents feature into the /api/index.js file (this file is automatically loaded for us on server startup). Because our fixtures are exported as functions, after we import them, in the function exported from /api/index.js, we call to those functions, making sure to await the calls to avoid race conditions with our data (remember, our user needs to exist before we try to create documents).

Before we move on, we need to make one more tiny change to ensure we can fetch documents for our test:

/api/documents/graphql/queries.js

import isDocumentOwner from "../../../lib/isDocumentOwner";
import Documents from "../index";

export default {
  documents: async (parent, args, context) => {
    return Documents.find().toArray();
  },
  [...]
};

By default, the example documents resolvler in the Node.js Boilerplate passes a query to the Documents.find() method requesting back documents only for the logged-in user's _id. Here, we can remove this query and just ask for all of the documents back since we're just testing this out.

That's it on the server side. Let's jump back to the client and wire this up to our sitemap.

Fetching data from our GraphQL API

Like we saw in the last section, the Node.js Boilerplate also includes a fully configured GraphQL server and existing resolvers for fetching documents. Back in our /pages/sitemap.xml.js file, let's pull in the included GraphQL client in the Next.js Boilerplate and fetch some data from the existing documents resolver in the GraphQL API:

/pages/sitemap.xml.js

import React from "react";
import fs from "fs";
import { documents as documentsQuery } from "../graphql/queries/Documents.gql";
import client from "../graphql/client";

const Sitemap = () => {};

export const getServerSideProps = async ({ res }) => {
  const baseUrl = {
    development: "http://localhost:5000",
    production: "https://mydomain.com",
  }[process.env.NODE_ENV];

  const staticPages = fs
    .readdirSync("pages")
    .filter((staticPage) => {
      return ![
        "_app.js",
        "_document.js",
        "_error.js",
        "sitemap.xml.js",
      ].includes(staticPage);
    })
    .map((staticPagePath) => {
      return `${baseUrl}/${staticPagePath}`;
    });

  const { data } = await client.query({ query: documentsQuery });
  const documents = data?.documents || [];

  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      ${staticPages
        .map((url) => {
          return `
            <url>
              <loc>${url}</loc>
              <lastmod>${new Date().toISOString()}</lastmod>
              <changefreq>monthly</changefreq>
              <priority>1.0</priority>
            </url>
          `;
        })
        .join("")}
      ${documents
        .map(({ _id, updatedAt }) => {
          return `
              <url>
                <loc>${baseUrl}/documents/${_id}</loc>
                <lastmod>${updatedAt}</lastmod>
                <changefreq>monthly</changefreq>
                <priority>1.0</priority>
              </url>
            `;
        })
        .join("")}
    </urlset>
  `;

  res.setHeader("Content-Type", "text/xml");
  res.write(sitemap);
  res.end();

  return {
    props: {},
  };
};

export default Sitemap;

Up at the top of the file, we've imported the example GraphQL query file from the /graphql/queries/Documents.gql file included in the CheatCode Next.js Boilerplate. Below that, we also import the included GraphQL client from /graphql/client.js.

Back in our getServerSideProps function, we add in a call to client.query() to execute a GraphQL query for our documents just beneath our earlier call to get our staticPages. With our list in tow, we repeat the same pattern we saw earlier, .map()ing over the documents we found and using the same XML structure we used with our static pages.

The big difference here is that for our <loc></loc>, we're building our URL by hand inside of the .map(), utilizing our existing baseUrl value and appending /documents/${_id} to it, where _id is the unique ID of the current document we're mapping over. We've also swapped the inline call to new Date().toISOString() passed to <lastmod></lastmod> with the updatedAt timestamp we set in the database.

That's it! If you visit http://localhost:5000/sitemap.xml in the browser, you should see our existing static pages, along with our dynamically generated document URLs:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://localhost:5000/documents</loc>
    <lastmod>2021-04-14T03:06:24.018Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/login</loc>
    <lastmod>2021-04-14T03:06:24.018Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/recover-password</loc>
    <lastmod>2021-04-14T03:06:24.018Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/reset-password</loc>
    <lastmod>2021-04-14T03:06:24.018Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/signup</loc>
    <lastmod>2021-04-14T03:06:24.018Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/documents/y9QSUXFlSqzl3ZzN</loc>
    <lastmod>2021-04-14T02:27:06.747Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/documents/6okKJ3vHX5K0F4A1</loc>
    <lastmod>2021-04-14T02:27:06.749Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://localhost:5000/documents/CdyxBJnVk70vpeSX</loc>
    <lastmod>2021-04-14T02:27:06.750Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  [...]
</urlset>

From here, once your site is deployed online, you can submit your sitemap to search engines like Google to ensure your site is properly indexed and ranked.

Handling Next.js Build Issues on Vercel

For developers who are attempting to make the above code work on Vercel, a small change needs to be made to the call to fs.readdirSync() above. Instead of using fs.readdirSync("pages") like we show above, you will need to modify your code to look like this:

/pages/sitemap.xml.js

const staticPages = fs
  .readdirSync({
    development: 'pages',
    production: './',
  }[process.env.NODE_ENV])
  .filter((staticPage) => {
    return ![
      "_app.js",
      "_document.js",
      "_error.js",
      "sitemap.xml.js",
    ].includes(staticPage);
  })
  .map((staticPagePath) => {
    return `${baseUrl}/${staticPagePath}`;
  });

The change here is what we pass to fs.readdirSync(). In a Vercel deployed Next.js app, the path to your pages directory changes. Adding in a conditional path like we see above ensures that when your sitemap code runs, it resolves pages to the correct path (in this case, to the /build/server/pages directory generated when Vercel builds your app).

Wrapping up

In this tutorial, we learned how to dynamically generate a sitemap with Next.js. We learned how to utilize the getServerSideProps function in Next.js to hijack the response to requests made to the /sitemap.xml page in our app and return an XML string, forcing the Content-Type header to be text/xml to simulate returning an .xml file.

We also looked at generating some test data in MongoDB using Node.js and retrieving that data for inclusion in our sitemap via a GraphQL query.

Written By
Ryan Glover

Ryan Glover

CEO/CTO @ CheatCode