Basic SLURM Commands

Commands

Syntax

Description

Commands

Syntax

Description

sbatch

sbatch <job-id>

Submit a batch script to SLURM for processing.

squeue

squeue -u cwid

Show information about your job(s) in the queue. The command when run without the -u flag, shows a list of your job(s) and all other jobs in the queue.

srun

srun <resource-parameters>

Run jobs interactively on the cluster.

scancel

scancel <job-id>

End or cancel a queued job.

sacct

sacct

Show information about current and previous jobs.

sinfo

sinfo

To check the status of the cluster and partition, including availability, time limits, and the number of node

Requesting Resources

General Partition for all users on BRB Cluster

  • scu-cpu: 22 cpu nodes, 7-day runtime limit

  • scu-gpu: 4 gpu nodes, 2-day runtime limit

Syntax: sinfo or sinfo --[optional flags]

sinfo

Output: The output below shows a list of the entire partition on the BRB cluster.

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST scu-cpu* up 7-00:00:00 18 mix scu-node[023,032-033,035-047,049,079] scu-cpu* up 7-00:00:00 4 alloc scu-node[020-022,034] scu-gpu up 2-00:00:00 4 mix scu-node[050-051,081-082] cryo-cpu up 7-00:00:00 1 idle scu-node065 cryo-cpu up 7-00:00:00 1 idle scu-node002 cryo-cpu up 7-00:00:00 2 mix scu-node[001,064] cryo-cpu up 7-00:00:00 10 idle scu-node[063,066-074] cryo-gpu up 2-00:00:00 6 mix scu-node[003-008] cryo-gpu-v100 up 2-00:00:00 3 mix scu-node[054-056] cryo-gpu-p100 up 2-00:00:00 1 mix scu-node060 cryo-gpu-p100 up 2-00:00:00 2 idle scu-node[061-062] boudker-cpu up 7-00:00:00 1 alloc scu-node010 boudker-cpu up 7-00:00:00 1 idle scu-node009 boudker-gpu up 7-00:00:00 2 mix scu-node[011-012] boudker-gpu-p100 up 7-00:00:00 3 idle scu-node[057-059] accardi-gpu up 2-00:00:00 1 mix scu-node015 accardi-gpu up 2-00:00:00 2 alloc scu-node[013-014] accardi-gpu2 up 2-00:00:00 1 idle scu-node016 accardi-cpu up 7-00:00:00 1 idle scu-node017 sackler-gpu up 7-00:00:00 1 mix scu-node018 sackler-cpu up 7-00:00:00 1 mix scu-node019 hwlab-rocky-cpu up 7-00:00:00 3 idle scu-node[052-053,099] hwlab-rocky-gpu up 7-00:00:00 12 mix scu-node[085-096] scu-res up 7-00:00:00 1 idle scu-login03 eliezer-gpu up 7-00:00:00 1 idle scu-node097

Header

Description

Header

Description

PARTITION

The list of the cluster’s partitions. It’s a set of compute nodes grouped logically

AVAIL

The active state of the partition. (up, down, idle)

TIMELIMIT

The maximum job execution walltime per partition.

NODES

The total number of nodes per partition.

STATE

mix Only part of the node is allocated to one or more jobs and the rest in an Idle state.

allocThe entire resource on the node(s) is being utilized

idleThe node is in an idle start and has none of it’s resources being used..

NODELIST(REASON)

The list of nodes per partition.

 

To request specific numbers of GPUs, you should add your request to your srun/sbatch:  

Below is an example of requesting 1 GPU - can request up to 4 GPUs on a single node

--gres=gpu:1

SRUN: Interactive Session

Example:

Breakdown:

  • --gres=gpu:1: Allocates 1 GPU to your job.

  • --partition=partition_name: Specifies the partition to run the job in. Replace partition_name with the appropriate partition, like scu-gpu.

  • --time=01:00:00: Requests 1 hour of runtime. Adjust the time as needed.

  • --mem=8G: Requests 8 GB of memory.

  • --cpus-per-task=4: Requests 4 CPU cores.

  • --pty bash: Launches an interactive bash shell after resources are allocated.

 

SBATCH: is used to submit a job script for later execution.

The shebang (#!) at the beginning of a script tells the shell which interpreter to use for executing the commands. In a Slurm script, it specifies that the script should be run using the Bash

In Slurm, lines beginning with #SBATCH are treated as commands. To comment out a Slurm command, you need to add a second # at the beginning. For example, #SBATCH is a command, while ##SBATCH indicates a comment.

The #SBATCH lines in the script below contain directives that are recommended as defaults for all job submissions.

 

Additional flags to add to sbatch script

 


Submit the batch script

After the job has been submitted, you should get an output similar to the one below but with a different jobid.

You can use the command below to check the progress of your submitted job in the queue.

syntax: squeue -u <your cwid>

output

 


Scontrol

syntax: scontrol show jobid <jobid>

output

 


Terminating Jobs

The scancel command is used to kill or end the current state(Pending, running) of your job in the queue.

Syntax: scancel <jobid> or skill <jobid>

Or