tutorial // Mar 17, 2023
How to Create a Tar Archive Using Node.js
How to use the tar command line tool to generate a tarball archive using Node.js.
Getting Started
Because the code we're writing for this tutorial is "standalone" (meaning it's not part of a bigger app or project), we're going to create a Node.js project from scratch. If you don't already have Node.js installed on your computer, read this tutorial first and then come back here.
Once you have Node.js installed on your computer, from your projects folder on your computer (e.g., ~/projects
), create a new folder for our work:
Terminal
mkdir app
Next, cd
into that directory and create an index.js
file (this is where we'll write our code for the tutorial):
Terminal
cd app && touch index.js
One last step: in the package.json
file that was created for you, make sure to add the field "type": "module"
as a property. This will enable ESModules support and allow us to use the import
statements shown in the code below.
With that in place, we're ready to get started.
Building out our test data
Our goal for this tutorial will be to create a .tar
archive containing a folder inside of our project that we're going to create now called /data
. While we can technically pass any path we want to create our .tar
archive (often referred to as a "tarball") from, we're going to use this /data
directory to make things a bit clearer. Inside, we want to create four (4) empty text files, aiming for a structure like this:
├── /data
│ ├── file1.txt
│ ├── file2.txt
│ ├── file3.txt
└── └── file4.txt
To add some context and help us understand a more complex archive, next, inside of the /data
folder, we want to create a package.json
file and install the package dayjs
. While we won't be writing any code in here, we want to demonstrate how to ignore folders like node_modules
when we create our archive.
Terminal
cd data && npm init -f && npm i dayjs
This will automatically change into the data
directory, initialize our package.json
file, and install dayjs
.
Before we move on, outside of our /data
folder (at the root of our project), we want to create a .gitignore
file. Inside, we want to add the following:
.gitignore
node_modules
Here, we're specifying that in our .git
project (if we had one; we don't here) we'd want that .git
project to exclude the node_modules
folder. As we'll see next, we're going to leverage this file in the exclude list for our .tar
archive.
With all of that scaffolded out, we're ready to dig into the code for creating our Tar archive. To start, we're going to write a function to help us build out our exclude list.
Writing a function to build our exclude list
When working with a Tar archive, an "exclude list" is just a list of files or directories to exclude from an archive. While we could technically just pass an exclude list manually to our tar
call, writing this function allows us to account for a more complex setup. To start, we're going to create a separate file at /getTarExcludeList.js
in our project (from our top-level project, not inside of the /data
directory we created above):
/getTarExcludeList.js
import fs from 'fs';
export default (excludedPaths = []) => {
const gitignore = fs.existsSync('.gitignore') ? fs.readFileSync('.gitignore', 'utf-8') : '';
const gitignoreFiles = gitignore?.split('\n')?.filter((file) => {
return !file?.includes('#') && file?.trim() !== '';
});
// We'll implement the rest of our function here...
};
To start, we define a function which takes an argument excludedPaths
, expecting it to be an array (the = []
after the argument definition is setting a default value in the event that excludedPaths
is undefined or null).
Next, inside of the function body, we start by retrieving the .gitignore
file from the project, if it exists (in this case it does, but we want to write code that won't break in the event it's not there). To do that, we use the built-in Node.js fs
package (imported up top), starting first with the fs.existsSync()
function to make sure the file exists, and if it does—using a ternary operator—we call to retrieve it as a utf-8
formatted string with fs.readFileSync()
. In the event that it doesn't exist, we just return an empty string.
Next, assuming we found a .gitignore
, we call the .split()
method on it, "splitting" the string into an array of strings using the \n
or "newline" character as our split point (i.e., every time we hit a newline character, start a new item/string in the array). So that's clear, we expect that .split()
to return something like this given our current .gitignore
file:
['node_modules']
Next, on this resulting array, we want to run a filter to exclude two potential bits:
- Lines including a
#
hash which indicates a comment line in a.gitignore
file. - Lines that are blank/empty.
Assuming that the file
we're currently iterating over with .filter()
doesn't include a #
and isn't empty, we include it in the final array we store in gitIgnoreFiles
, otherwise we discard it.
getTarExcludeList.js
import fs from 'fs';
export default (excludedPaths = []) => {
const gitignore = fs.existsSync('.gitignore') ? fs.readFileSync('.gitignore', 'utf-8') : '';
const gitignoreFiles = gitignore?.split('\n')?.filter((file) => {
return !file?.includes('#') && file?.trim() !== '';
});
const ignoreFiles = [
...gitignoreFiles,
...excludedPaths,
"*.tar",
"*.tar.gz",
"*.tar.xz"
]?.filter((item, itemIndex, array) => {
return array.indexOf(item) === itemIndex;
});
const excludeList = `{${ignoreFiles?.map((ignoreFile) => `"${ignoreFile}"`).join(',')}}`;
return excludeList;
};
Below this, next, we build out our master ignoreFiles
array, combining all of the .gitignore
files, any excludedPaths
passed in, and three wildcard selectors which will match any .tar
archives (we want to avoid a "turtles all the way down" scenario where we accidentally re-archive an existing archive).
On the end of that array definition, we chain on another .filter()
, this time filtering out any duplicate entries in the event that our .gitignore
file and excludedPaths
mention the same paths.
Finally, the "tricky" part. The format for the tar
exclude list looks something like this:
{"node_modules", "anotherPath", "testing.txt"}
So we can make it easy to pass off to our call to tar
, before we return our list, we format it into a string following this pattern (a list of comma-separated strings, wrapped with curly braces {}
).
With that built, we return that string excludeList
from our function. Now, let's see how we connect this with the generation of our .tar
archive.
Creating a Tar archive
Creating the archive is fairly straightforward. Moving back to the /index.js
file we created earlier, now, let's build out the actual .tar
creation:
index.js
import child_process from 'child_process';
import getTarExcludeList from './getTarExcludeList.js';
const excludeList = getTarExcludeList([
'package.json',
'package-lock.json',
]);
child_process.execSync(`tar -cf archive.tar.xz --exclude=${excludeList} ./data`);
This is all we need. Up top, we import the native Node.js child_process
package, followed by the getTarExcludeList
function we just wrote. First, we call that function, passing in an array with two files to combine with our .gitignore
file: package.json
and package-lock.json
.
In return, we expect back our formatted excludeList
string. To use it, below we call to child_process.execSync()
passing the following command:
`tar -cf archive.tar.xz --exclude=${excludeList} ./data`
In case it's not clear, the backticks ` ` here denote a string that has interpolated values inside (the ones wrapped with ${}
). For the command, we call to the tar
command which should be installed on your operating system. Immediately after this, we pass the flag -cf
where -c
stands for "create" and the f
immediately after it is marking the file
name we want to use archive.tar.xz
(we could technically write this like -c -f archive.tar.xz
but we use the short-hand version to keep our command compact).
Next, we pass the --exclude
flag followed by =${excludeList}
which is where we pass in our formatted exclude list. Finally, after all of our flags, we pass the path to the file(s) or directories we want to archive, in this case, our data
file (we use ./data
here to ensure we're referencing the data
folder at the root of our project).
That's it! If we open up our terminal/command line and run node index.js
from the root of our project, we should see an archive.tar.xz
spit out.
Wrapping up
In this tutorial, we learned how to generate a Tar archive using Node.js. To do it, we first learned how to create a function that helped us generate our exclude list, and then, how to call to the tar
command installed on our computer using child_process.execSync()
.