Running Jupyter Notebooks from a Supercomputer
6 April 2025
When running a program in an HPC cluster, the most common workflow is to prepare a launcher script and send it to the cluster’s job scheduler. This is great for non-interactive code, but what to do when the code you are running consists of a server you need to interact with? For example, many people write machine learning code using Jupyter notebooks, which require starting a server and then manually accessing them.
Recently I had to use remote GPUs to run some ML notebooks and decided to document my efforts. If you found this post and are perhaps dealing with the same headaches, have no fear! By doing a couple ssh tunnels, you will be in no time running using supercomputer resources from the comfort of your own browser.
As I am using Purdue’s Anvil supercomputer, this post will focus on it. Nevertheless, any cluster doing its workload management through SLURM should work the same.
Launching the Server
First things first, let’s connect to the cluster via
SSH. I will use <USERNAME>
to refer
to your remote user (on the cluster) and
<ALLOCATION>
for the allocation code
giving access to your resources.
ssh -l <USERNAME> anvil.rcac.purdue.edu
The cluster will connect you to a login node
from where you can prepare and schedule your jobs. By
checking your hostname
, it should start
with login
and some digits. For example, on
Anvil I get
$ hostname
login05.anvil.rcac.purdue.edu
The usual SLURM workflow consists of writing a script
for the application and launching it with
sbatch
. In our case, the job is
interactive, so we use its cousin
sinteractive
to start an interactive
section on a compute node. You call it by
passing your allocation and specifying the resources
needed for the session.
sinteractive -A <ALLOCATION> --partition gpu --nodes=1 --gpus-per-node=1 --time=3:00:00
It may take a while until the cluster assigns you the
needed resources. When it is ready, you will be back at
a shell but within a compute node. You can check that
the hostname
changed. On Anvil I get
$ hostname
a240.anvil.rcac.purdue.edu
Now we are ready to get working! Load the modules and
start the Jupyter server at a chosen port. Let’s use
8895
for didactic purposes. You can choose
whatever you prefer. We also tell it to do not start a
web browser (no use on a remote machine) and to serve
publicly by assigning an ip 0.0.0.0
.
module load jupyter
jupyter notebook --port 8895 --no-browser --ip=0.0.0.0
At the end of its output, Jupyter will show a local url with a token like
http://127.0.0.1:8895/?token=deee923fe443decacc7ae4a127dc0a1bbe0a65b96d91e7da
Take note of it.
The server is running and we can interact with it from Anvil!
Connecting From Your Browser
Although we can already use the supercomputer resources to run the notebooks, we are still missing a way to interact with it from our own machines. The solution is to connect again to the cluster while establishing a ssh tunnel.
Open another shell while keeping the
previous one open. This is important! If you close the
original connection, it will shutdown your interactive
session. Now, on the new terminal connect to the cluster
while passing the -L
parameter to
ssh
to do port forwarding from
8895
to 8080
. Again, you can
choose whatever ports you prefer — even the same port,
if you want to simplify it.
ssh -L 8080:a240.anvil.rcac.purdue.edu:8895 -l <USERNAME> anvil.rcac.purdue.edu
Finally, by keeping both terminals open, we
can connect to the remote Jupyter server at
localhost:8080
! The access token is the one
you took note earlier.