Basic SLURM Commands

Cluster nodes are managed through Slurm. Accessing and running jobs on login nodes or by ssh’ing directly to a node is Strictly forbidden.

Commands	Syntax	Description

Commands	Syntax	Description
`sbatch`	`sbatch <job-id>`	Submit a batch script to SLURM for processing.
`squeue`	`squeue -u cwid`	Show information about your job(s) in the queue. The command when run without the `-u` flag, shows a list of your job(s) and all other jobs in the queue.
`srun`	`srun <resource-parameters>`	Run jobs interactively on the cluster.
`scancel`	`scancel <job-id>`	End or cancel a queued job.
`sacct`	`sacct`	Show information about current and previous jobs.
`sinfo`	`sinfo`	To check the status of the cluster and partition, including availability, time limits, and the number of node

Requesting Resources

General Partition for all users on BRB Cluster

scu-cpu: 22 cpu nodes, 7-day runtime limit
scu-gpu: 4 gpu nodes, 2-day runtime limit

Syntax: sinfo or sinfo --[optional flags]

sinfo

Output: The output below shows a list of the entire partition on the BRB cluster.

PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST
scu-cpu*             up 7-00:00:00     18    mix scu-node[023,032-033,035-047,049,079]
scu-cpu*             up 7-00:00:00      4  alloc scu-node[020-022,034]
scu-gpu              up 2-00:00:00      4    mix scu-node[050-051,081-082]
cryo-cpu             up 7-00:00:00      1   idle scu-node065
cryo-cpu             up 7-00:00:00      1   idle scu-node002
cryo-cpu             up 7-00:00:00      2    mix scu-node[001,064]
cryo-cpu             up 7-00:00:00     10   idle scu-node[063,066-074]
cryo-gpu             up 2-00:00:00      6    mix scu-node[003-008]
cryo-gpu-v100        up 2-00:00:00      3    mix scu-node[054-056]
cryo-gpu-p100        up 2-00:00:00      1    mix scu-node060
cryo-gpu-p100        up 2-00:00:00      2   idle scu-node[061-062]
boudker-cpu          up 7-00:00:00      1  alloc scu-node010
boudker-cpu          up 7-00:00:00      1   idle scu-node009
boudker-gpu          up 7-00:00:00      2    mix scu-node[011-012]
boudker-gpu-p100     up 7-00:00:00      3   idle scu-node[057-059]
accardi-gpu          up 2-00:00:00      1    mix scu-node015
accardi-gpu          up 2-00:00:00      2  alloc scu-node[013-014]
accardi-gpu2         up 2-00:00:00      1   idle scu-node016
accardi-cpu          up 7-00:00:00      1   idle scu-node017
sackler-gpu          up 7-00:00:00      1    mix scu-node018
sackler-cpu          up 7-00:00:00      1    mix scu-node019
hwlab-rocky-cpu      up 7-00:00:00      3   idle scu-node[052-053,099]
hwlab-rocky-gpu      up 7-00:00:00     12    mix scu-node[085-096]
scu-res              up 7-00:00:00      1   idle scu-login03
eliezer-gpu          up 7-00:00:00      1   idle scu-node097

Header	Description

Header	Description
`PARTITION`	The list of the cluster’s partitions. It’s a set of compute nodes grouped logically
`AVAIL`	The active state of the partition. (up, down, idle)
`TIMELIMIT`	The maximum job execution `walltime` per partition.
`NODES`	The total number of nodes per partition.
`STATE`	`mix` Only part of the node is allocated to one or more jobs and the rest in an Idle state. `alloc`The entire resource on the node(s) is being utilized `idle`The node is in an idle start and has none of it’s resources being used..
`NODELIST(REASON)`	The list of nodes per partition.

To request specific numbers of GPUs, you should add your request to your srun/sbatch:

Below is an example of requesting 1 GPU - can request up to 4 GPUs on a single node

--gres=gpu:1

SRUN: Interactive Session

Example:

Breakdown:

--gres=gpu:1: Allocates 1 GPU to your job.
--partition=scu-cpu: Specifies the partition to run the job in.
--mem=8G: Requests 8 GB of memory.
--cpus-per-task=4: Requests 4 CPU cores.
--pty bash: Launches an interactive bash shell after resources are allocated.

SBATCH: is used to submit a job script for later execution.

The shebang (#!) at the beginning of a script tells the shell which interpreter to use for executing the commands. In a Slurm script, it specifies that the script should be run using the Bash

In Slurm, lines beginning with #SBATCH are treated as commands. To comment out a Slurm command, you need to add a second # at the beginning. For example, #SBATCH is a command, while ##SBATCH indicates a comment.

The #SBATCH lines in the script below contain directives that are recommended as defaults for all job submissions.

Additional flags that can be added to the sbatch script above:

Submit the batch script

After the job has been submitted, you should get an output similar to the one below but with a different jobid.

You can use the command below to check the progress of your submitted job in the queue.

syntax: squeue -u <your cwid>

output

Scontrol

syntax: scontrol show jobid <jobid>

output

Terminating Jobs

The scancel command is used to kill or end the current state(Pending, running) of your job in the queue.

Syntax: scancel <jobid> or skill <jobid>

Or

wiki

Basic SLURM Commands

Analytics

Requesting Resources

Breakdown:

Terminating Jobs