tutorial // Mar 17, 2023

How to Create a Tar Archive Using Node.js

How to use the tar command line tool to generate a tarball archive using Node.js.

How to Create a Tar Archive Using Node.js

Getting Started

Because the code we're writing for this tutorial is "standalone" (meaning it's not part of a bigger app or project), we're going to create a Node.js project from scratch. If you don't already have Node.js installed on your computer, read this tutorial first and then come back here.

Once you have Node.js installed on your computer, from your projects folder on your computer (e.g., ~/projects), create a new folder for our work:

Terminal

mkdir app

Next, cd into that directory and create an index.js file (this is where we'll write our code for the tutorial):

Terminal

cd app && touch index.js

One last step: in the package.json file that was created for you, make sure to add the field "type": "module" as a property. This will enable ESModules support and allow us to use the import statements shown in the code below.

With that in place, we're ready to get started.

Building out our test data

Our goal for this tutorial will be to create a .tar archive containing a folder inside of our project that we're going to create now called /data. While we can technically pass any path we want to create our .tar archive (often referred to as a "tarball") from, we're going to use this /data directory to make things a bit clearer. Inside, we want to create four (4) empty text files, aiming for a structure like this:

├── /data
│   ├── file1.txt
│   ├── file2.txt
│   ├── file3.txt
└── └── file4.txt

To add some context and help us understand a more complex archive, next, inside of the /data folder, we want to create a package.json file and install the package dayjs. While we won't be writing any code in here, we want to demonstrate how to ignore folders like node_modules when we create our archive.

Terminal

cd data && npm init -f && npm i dayjs

This will automatically change into the data directory, initialize our package.json file, and install dayjs.

Before we move on, outside of our /data folder (at the root of our project), we want to create a .gitignore file. Inside, we want to add the following:

.gitignore

node_modules

Here, we're specifying that in our .git project (if we had one; we don't here) we'd want that .git project to exclude the node_modules folder. As we'll see next, we're going to leverage this file in the exclude list for our .tar archive.

With all of that scaffolded out, we're ready to dig into the code for creating our Tar archive. To start, we're going to write a function to help us build out our exclude list.

Writing a function to build our exclude list

When working with a Tar archive, an "exclude list" is just a list of files or directories to exclude from an archive. While we could technically just pass an exclude list manually to our tar call, writing this function allows us to account for a more complex setup. To start, we're going to create a separate file at /getTarExcludeList.js in our project (from our top-level project, not inside of the /data directory we created above):

/getTarExcludeList.js

import fs from 'fs';

export default (excludedPaths = []) => {
  const gitignore = fs.existsSync('.gitignore') ? fs.readFileSync('.gitignore', 'utf-8') : '';
  const gitignoreFiles = gitignore?.split('\n')?.filter((file) => {
    return !file?.includes('#') && file?.trim() !== '';
  });

  // We'll implement the rest of our function here...
};

To start, we define a function which takes an argument excludedPaths, expecting it to be an array (the = [] after the argument definition is setting a default value in the event that excludedPaths is undefined or null).

Next, inside of the function body, we start by retrieving the .gitignore file from the project, if it exists (in this case it does, but we want to write code that won't break in the event it's not there). To do that, we use the built-in Node.js fs package (imported up top), starting first with the fs.existsSync() function to make sure the file exists, and if it does—using a ternary operator—we call to retrieve it as a utf-8 formatted string with fs.readFileSync(). In the event that it doesn't exist, we just return an empty string.

Next, assuming we found a .gitignore, we call the .split() method on it, "splitting" the string into an array of strings using the \n or "newline" character as our split point (i.e., every time we hit a newline character, start a new item/string in the array). So that's clear, we expect that .split() to return something like this given our current .gitignore file:

['node_modules']

Next, on this resulting array, we want to run a filter to exclude two potential bits:

  1. Lines including a # hash which indicates a comment line in a .gitignore file.
  2. Lines that are blank/empty.

Assuming that the file we're currently iterating over with .filter() doesn't include a # and isn't empty, we include it in the final array we store in gitIgnoreFiles, otherwise we discard it.

getTarExcludeList.js

import fs from 'fs';

export default (excludedPaths = []) => {
  const gitignore = fs.existsSync('.gitignore') ? fs.readFileSync('.gitignore', 'utf-8') : '';
  const gitignoreFiles = gitignore?.split('\n')?.filter((file) => {
    return !file?.includes('#') && file?.trim() !== '';
  });

  const ignoreFiles = [
    ...gitignoreFiles,
    ...excludedPaths,
    "*.tar",
    "*.tar.gz",
    "*.tar.xz"
  ]?.filter((item, itemIndex, array) => {
    return array.indexOf(item) === itemIndex;
  });
  
  const excludeList = `{${ignoreFiles?.map((ignoreFile) => `"${ignoreFile}"`).join(',')}}`;
  
  return excludeList;
};

Below this, next, we build out our master ignoreFiles array, combining all of the .gitignore files, any excludedPaths passed in, and three wildcard selectors which will match any .tar archives (we want to avoid a "turtles all the way down" scenario where we accidentally re-archive an existing archive).

On the end of that array definition, we chain on another .filter(), this time filtering out any duplicate entries in the event that our .gitignore file and excludedPaths mention the same paths.

Finally, the "tricky" part. The format for the tar exclude list looks something like this:

{"node_modules", "anotherPath", "testing.txt"}

So we can make it easy to pass off to our call to tar, before we return our list, we format it into a string following this pattern (a list of comma-separated strings, wrapped with curly braces {}).

With that built, we return that string excludeList from our function. Now, let's see how we connect this with the generation of our .tar archive.

Creating a Tar archive

Creating the archive is fairly straightforward. Moving back to the /index.js file we created earlier, now, let's build out the actual .tar creation:

index.js

import child_process from 'child_process';
import getTarExcludeList from './getTarExcludeList.js';

const excludeList = getTarExcludeList([
  'package.json',
  'package-lock.json',
]);

child_process.execSync(`tar -cf archive.tar.xz --exclude=${excludeList} ./data`);

This is all we need. Up top, we import the native Node.js child_process package, followed by the getTarExcludeList function we just wrote. First, we call that function, passing in an array with two files to combine with our .gitignore file: package.json and package-lock.json.

In return, we expect back our formatted excludeList string. To use it, below we call to child_process.execSync() passing the following command:

`tar -cf archive.tar.xz --exclude=${excludeList} ./data`

In case it's not clear, the backticks ` ` here denote a string that has interpolated values inside (the ones wrapped with ${}). For the command, we call to the tar command which should be installed on your operating system. Immediately after this, we pass the flag -cf where -c stands for "create" and the f immediately after it is marking the file name we want to use archive.tar.xz (we could technically write this like -c -f archive.tar.xz but we use the short-hand version to keep our command compact).

Next, we pass the --exclude flag followed by =${excludeList} which is where we pass in our formatted exclude list. Finally, after all of our flags, we pass the path to the file(s) or directories we want to archive, in this case, our data file (we use ./data here to ensure we're referencing the data folder at the root of our project).

That's it! If we open up our terminal/command line and run node index.js from the root of our project, we should see an archive.tar.xz spit out.

Wrapping up

In this tutorial, we learned how to generate a Tar archive using Node.js. To do it, we first learned how to create a function that helped us generate our exclude list, and then, how to call to the tar command installed on our computer using child_process.execSync().

Written By
Ryan Glover

Ryan Glover

CEO/CTO @ CheatCode