tutorial // Feb 25, 2022

How to Clone and Sync a Github Repo via Node.js

How to use the git clone command via child_process.execSync() in Node.js to clone a Github repo and sync the latest changes programatically.

How to Clone and Sync a Github Repo via Node.js

Getting Started

Because the code we're writing for this tutorial is "standalone" (meaning it's not part of a bigger app or project), we're going to create a Node.js project from scratch. If you don't already have Node.js installed on your computer, read this tutorial first and then come back here.

Once you have Node.js installed on your computer, from your projects folder on your computer (e.g., ~/projects), create a new folder for our work:

Terminal

mkdir clone

Next, cd into that directory and create an index.js file (this is where we'll write our code for the tutorial):

Terminal

cd clone && touch index.js

Next, we want to install two dependencies, dotenv and express:

Terminal

npm i dotenv express

The first will give us access to the dotenv package which helps us to set environment variables on the Node.js process.env object and the second, Express, will be used to spin up a demo server.

One last step: in the package.json file that was created for you, make sure to add the field "type": "module" as a property. This will enable ESModules support and allow us to use the import statements shown in the code below.

With that in place, we're ready to get started.

Obtaining a personal access token from Github

Before we dig into the code, we want to obtain a Personal Access Token from Github. This will allow us to clone both public and private repositories using the pattern we'll learn below.

If you don't already have a Github account, you can sign up at this link. If you do have an account, make sure you're logged in and then click on your avatar in the top right-hand side of the navigation and from the menu that pops up, select the "Settings" option near the bottom of the menu.

On the next page, from the left-hand navigation, near the bottom, select the "Developer Settings" option. On the next page, from the left-hand navigation, select the "Personal Access Tokens" option. Finally, from the resulting page, click the "Generate new token" button.

On the next page, in the "Note" field, give the token a name relative to the app you're building (e.g., "clone repo tutorial" or "repo cloner").

For "Expiration," set whatever value you think is appropriate. If you're just implementing this tutorial for fun, it's wise to set this to the lowest possible value.

Under "Select scopes", check the box next to "repo" to select all of the repo-related scopes. These "scopes" tell Github what you have access to when using this token. Only "repo" is necessary for this tutorial, but feel free to customize your token's scopes to meet the needs of your app.

Finally, at the bottom of the screen, click the green "Generate token" button.

Note: pay attention here. Once your token is generated, it will be displayed temporarily in a light green box with a copy button next to it. Github will not show you this token again. It's recommended that you copy it and store it in a password manager using a name like "Github Personal Access Token <note>" where <note> should be replaced by the name you typed into the "Note" field on the previous page.

Once you have your token stored securely, we're ready to jump into the code.

Setting up a .env file

Earlier, we installed a package called dotenv. This package is designed to help you load environment variables on to the process.env object in Node.js. To do it, dotenv asks you to supply a file .env at the root of your project. Using the Personal Access Token we just generated on Github, we want to create this .env file at the root of our project and add the following:

.env

PERSONAL_ACCESS_TOKEN="<Paste Your Token Here>"

In this file, we want to add a single line PERSONAL_ACCESS_TOKEN="", pasting in the token we obtained from Github in the double-quotes. Next, we want to open up the index.js file at the root of our project and add the following:

/index.js

import 'dotenv/config';

Note: this must be at the very top of our file. When this code runs, it will call the config() function in the dotenv package which will locate the .env file we just created and load its contents onto process.env. Once this is complete, we can expect to have a value like process.env.PERSONAL_ACCESS_TOKEN available in our app.

That's it for now. We'll put this value to use later. Next, still in the index.js file, we want to set up the skeleton for an Express.js server.

Setting up an Express server and route

In order to trigger a clone of a repo, now, we want to set up an Express.js server with a route that we can visit in a browser, specifying the Github username, repo, and (optionally) branch name we want to clone.

/index.js

import 'dotenv/config';
import express from "express";

const app = express();

app.get('/repos/clone/:username/:repo', (req, res) => {
  // We'll handle the clone here...
});

app.listen(3000, () => {
  console.log('App running at http://localhost:3000');
});

Directly beneath our import 'dotenv/config'; line, next, we want to import express from the express package we installed earlier. Just below this, we want to create an Express server instance by calling the exported express() function and store the resulting instance in a variable app.

app represents our Express server instance. On it, we want to call two methods: .get() and .listen(). The .get() method allows us to define a route which specifies a URL pattern along with a handler function to be called when the URL of a request to our server matches that pattern.

Here, we call app.get() passing in that URL pattern as a string /repos/clone/:username/:repo, where :username and :repo are what are known as route parameters. These are "variables" in our URL and allow us to reuse the same URL pattern while expecting different inputs.

For example, this route will be accessible as /repos/clone/cheatcode/joystick or /repos/clone/motdotla/dotenv or even /repos/clone/microsoft/vscode. In that last example, microsoft would be recognized as the username and vscode would be recognized as the repo.

Before we write the code for cloning our repo inside of the handler function assigned as the second argument to app.get(), at the bottom of our file, we want to make sure we start our Express.js server, giving it a port number to run on. To do it, we call app.listen(), passing the port number we want to use as the first argument. As the second argument, we pass a callback function to fire after the server has been started (we add a console.log() to signal the startup back to us in our terminal).

/index.js

import 'dotenv/config';
import express from "express";
import fs from 'fs';
import cloneAndPullRepo from './cloneAndPullRepo.js';

const app = express();

app.get('/repos/clone/:username/:repo', (req, res) => {
  const username = req?.params?.username;
  const repo = req?.params?.repo;
  const repoPath = `${username}/${repo}`;
  const repoExists = fs.existsSync(`repos/${repoPath}`);
  const confirmation = repoExists ? `Pulling ${repoPath}...` : `Cloning ${repoPath}...`;

  cloneAndPullRepo(repoExists, username, repo, req?.query?.branch);
  
  res.status(200).send(confirmation);
});

app.listen(3000, () => {
  console.log('App running at http://localhost:3000');
});

Getting to work on our actual implementation, we want to focus our attention just inside the handler function passed as the second argument to app.get().

Here, we're organizing the information we'll need to perform our clone. From our route parameters (here, "params"), we want to get the username and repo portions of our URL. To do it, we just access the req.params object automatically provided to us by Express. We expect req.params.username and req.params.repo to be defined because we can see those params being declared in our URL (anything prefixed with a : colon in our URL is captured as a param).

Here, we store the username and repo from req.params in variables of the same name. With these, next, we set up the repoPath which is a combination of the username and repo, separated by a / forward slash (mimicking a URL you'd visit on Github).

With this information, next, we check to see if a folder already exists in the repos folder we intend to store all of the repos in at the root of our project (this doesn't exist but will be automatically created by Git the first time we clone a repo).

On the next line, if it does exist, we want to signal back to the request that we're pulling the repo (meaning, pulling the latest changes) and if it does not exist, we want to signal back that we're cloning it for the first time. We store the string that describes either scenario in a variable confirmation.

We can see that this confirmation variable is sent back to the original request via the res object given to us by Express. Here, we say "set the HTTP status code to 200 (success) and then send the confirmation string back as the response body."

Just above this, the part we care about, we call to a non-existent function cloneAndPullRepo() which will take in the variables we just defined and either clone a new repo or pull changes for an existing one. Notice that we pass our pre-defined repoExists, username, and repo variables as the first three arguments, but we've added an additional one on the end.

Optionally, we want to enable our users to pull a specific branch for their repo. Because this is optional (meaning it may or may not exist), we want to support this as a query parameter. This is different from a route parameter in that it does not dictate whether or not the route matches a URL. It's simply added on the end of the URL as metadata (e.g., /repos/clone/cheatcode/joystick?branch=development).

Just like route params, however, Express parses these query params for us as well, storing them in the req.query object. To the anticipated cloneAndPullRepo() function, we pass req.query.branch as the final argument.

With all of that in place, now, let's jump over to the cloning and pulling step. We want to create a file at the path we anticipated near the top of our file cloneAndPullRepo.js.

Wiring up a function for cloning and pulling

Now, in a new file, we want to wire up a function responsible for performing either the clone or pull of our repo.

/cloneAndPullRepo.js

import child_process from 'child_process';

export default (repoExists = false, username = '', repo = '', branch = 'master') => {
  if (!repoExists) {
    child_process.execSync(`git clone https://${username}:${process.env.PERSONAL_ACCESS_TOKEN}@github.com/${username}/${repo}.git repos/${username}/${repo}`);
  } else {
    child_process.execSync(`cd repos/${username}/${repo} && git pull origin ${branch} --rebase`);
  }
}

Because the code is limited, we've added the full source of the file here. Let's step through it.

First, at the bottom of our file we want to create a default export of a function (this is the one we anticipated existing back in index.js). That function should take in whether or not the repoExists, the username of the repo we want to clone (or pull), and the name of the repo we want to clone, and potentially a branch.

For each argument, we set a default value, the important two being repoExists which is set by default to false and branch which by default is set to master.

Looking at the code—acknowledging the import of child_process up top from the built-in Node.js child_process package passively—if repoExists is false, we want to call to the child_process.execSync() function which allows us to run commands relative to our operating system (as though we were in a terminal window) from within Node.js.

Here, execSync implies that we're using the synchronous version of the child_process.exec() function. This is done intentionally to ensure that the clone works for our example, however, you may want to use the asynchronous .exec() method instead so that, when called, the code doesn't block Node.js while it runs.

Focusing on what we pass to .execSync(), we pass a long command using JavaScript string interpolation to embed our variables in the git clone command we want to run:

`git clone https://${username}:${process.env.PERSONAL_ACCESS_TOKEN}@github.com/${username}/${repo}.git repos/${username}/${repo}`

Most of this should be self-explanatory, however, we want to call attention to the process.env.PERSONAL_ACCESS_TOKEN part. This is the value we set earlier via the dotenv package and our .env file. Here, we pass it as the password we want to authenticate our git clone request with (Github will recognize this access token thanks to its prefixed ghp_ identity and associate it with our account).

As an example, assuming we visited the URL http://localhost:3000/repos/clone/cheatcode/joystick in our browser, we'd expect the above code to generate a string like this:

git clone https://cheatcode:ghp_xxx@github.com/cheatcode/joystick.git repos/cheatcode/joystick

What this line now says is "we want to clone the cheatcode/joystick repo using the username cheatcode with the password ghp_xxx into the repos/cheatcode/joystick folder in our app."

When this runs, Git will notice that the repos folder doesn't exist yet and create it, along with a folder for our username cheatcode and then within that, a folder with our repo name (where our project's code will be cloned).

/cloneAndPullRepo.js

import child_process from 'child_process';

export default (repoExists = false, username = '', repo = '', branch = 'master') => {
  if (!repoExists) {
    child_process.execSync(`git clone https://${username}:${process.env.PERSONAL_ACCESS_TOKEN}@github.com/${username}/${repo}.git repos/${username}/${repo}`);
  } else {
    child_process.execSync(`cd repos/${username}/${repo} && git pull origin ${branch} --rebase`);
  }
}

Focusing on the second part of the function, if repoExists is true, we want to fallback to the else statement, again using .execSync(), however, this time running two commands: cd to "change directories" into the existing repos/username/repo folder and then git pull origin ${branch} --rebase to pull the latest changes for the specified branch (either the default master or whatever was passed as a query param to our URL).

That's it. With all of this in place, now, if we start up our app and pass the username and repo name of an existing Github repository in our URL (either one that's public, or, if private, one that we have access to), we should trigger the cloneAndPullRepo() function and see the repo downloaded into our project.

Wrapping up

In this tutorial we learned how to clone a Github repo using Node.js. We learned how to set up an Express.js server, along with a route where we could call a function that either cloned a new repo, or, pulled an existing one. To do that clone or pull, we learned how to use the child_process.execSync() function.

Written By
Ryan Glover

Ryan Glover

CEO/CTO @ CheatCode