Page Comparison

Table of Contents

minLevel	2
maxLevel	6
outline	false
style	decimal
type	list
printable	true

...

Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.

Important notice:

Warning
Do not run computations on login nodes

...

Code Block
gcc -fopenmp code.c

it It will generate an executable a.out that we will use to illustrate how to submit jobs. This code accepts two arguments: number of Monte Carlo samples, and number of parallel threads to be used and can be executed as ./a.out 1000 8 (1000 samples, run with 8 parallel threads).

SLURM Batch job example

Here is an example of the batch script that can be ran on the cluster. In this script we are requesting a single node, with 4 CPU cores, used in an SMP mode.

Code Block

language	bash

#!/bin/bash

#SBATCH --job-name=testjup  <jobname> # give your job a name
#SBATCH --nodes=1            # asking for 1 compute node
#SBATCH --ntasks=1           # 1 task
#SBATCH --cpus-per-task=4    # 4 CPU cores per task, so 4 cores in total         
#SBATCH --time=00:30:00      # set this time according to your need, 30 minutes here
#SBATCH --mem=8GB            # request appropriate amount of RAM
##SBATCH --gres=gpu:1        # if you need to use a GPU, note that this line is commented out
#SBATCH -p <partition_name>  # specify your partition. Make sure to run in the 

cd <code_directory>
export OMP_NUM_THREADS=4    # OMP_NUM_THREADS should be equal to the number of cores you are requesting
./a.out 1000000000 4        # running 1000000000 samples on 4 cores

Submit this job to the queue and inspect the output once it’s finished:

Code Block
sbatch script_name

You can view your jobs in the queue with:

Code Block
squeue -u <cwid>

Try to run a few jobs using different number of cores and see how it scales almost linearly.

Info

The more resources you request from SLURM, the harder it will be for SLURM to allocate space for your job. For parallel jobs, the more CPUs, the faster it runs, but the job may be stuck in the queue for longer. Be aware of this trade-off, there is no universal answer on what’s the best strategy, it usually depends on what kind of resources your particular job needs and how busy is the cluster at the moment.

SLURM interactive job example

Even though interactive jobs are inefficient are not recommended, sometimes there is no other way to do certain things. If you need to run an interactive job, here is how it can be done:

Code Block
srun --nodes 1 \ --tasks-per-node 1 \ --cpus-per-task 4 \ --partition=<partition_name> \ --gres=gpu:1 \ --pty /bin/bash -i

Once the job is successfully started, you will be dropped into interactive BASH session on one of the compute nodes. The same scheduling considerations apply here – the more resources you are requesting, the longer is the potential wait time.

Once you are done with your interactive work, simply run exit command. It will kill your bash process and therefore the whole SLURM job will be cancelled.

Version	Old Version 26	New Version 27
Changes made by	eud4002	eud4002
Saved on	Jan 30, 2025	Jan 30, 2025

Versions Compared

Key

Important notice:

SLURM Batch job example

SLURM interactive job example

Jupyter job

Monitoring job status