Basic SLURM Commands
Commands | Syntax | Description |
---|---|---|
|
| Submit a batch script to SLURM for processing. |
|
| Show information about your job(s) in the queue. The command when run without the |
|
| Run jobs interactively on the cluster. |
|
| End or cancel a queued job. |
|
| Show information about current and previous jobs. |
|
| To check the status of the cluster and partition, including availability, time limits, and the number of node |
Requesting Resources
General Partition for all users on BRB Cluster
scu-cpu: 22 cpu nodes, 7-day runtime limit
scu-gpu: 4 gpu nodes, 2-day runtime limit
Syntax: sinfo
or sinfo --[optional flags]
sinfo
Output: The output below shows a list of the entire partition on the BRB cluster.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
scu-cpu* up 7-00:00:00 18 mix scu-node[023,032-033,035-047,049,079]
scu-cpu* up 7-00:00:00 4 alloc scu-node[020-022,034]
scu-gpu up 2-00:00:00 4 mix scu-node[050-051,081-082]
cryo-cpu up 7-00:00:00 1 idle scu-node065
cryo-cpu up 7-00:00:00 1 idle scu-node002
cryo-cpu up 7-00:00:00 2 mix scu-node[001,064]
cryo-cpu up 7-00:00:00 10 idle scu-node[063,066-074]
cryo-gpu up 2-00:00:00 6 mix scu-node[003-008]
cryo-gpu-v100 up 2-00:00:00 3 mix scu-node[054-056]
cryo-gpu-p100 up 2-00:00:00 1 mix scu-node060
cryo-gpu-p100 up 2-00:00:00 2 idle scu-node[061-062]
boudker-cpu up 7-00:00:00 1 alloc scu-node010
boudker-cpu up 7-00:00:00 1 idle scu-node009
boudker-gpu up 7-00:00:00 2 mix scu-node[011-012]
boudker-gpu-p100 up 7-00:00:00 3 idle scu-node[057-059]
accardi-gpu up 2-00:00:00 1 mix scu-node015
accardi-gpu up 2-00:00:00 2 alloc scu-node[013-014]
accardi-gpu2 up 2-00:00:00 1 idle scu-node016
accardi-cpu up 7-00:00:00 1 idle scu-node017
sackler-gpu up 7-00:00:00 1 mix scu-node018
sackler-cpu up 7-00:00:00 1 mix scu-node019
hwlab-rocky-cpu up 7-00:00:00 3 idle scu-node[052-053,099]
hwlab-rocky-gpu up 7-00:00:00 12 mix scu-node[085-096]
scu-res up 7-00:00:00 1 idle scu-login03
eliezer-gpu up 7-00:00:00 1 idle scu-node097
Header | Description |
---|---|
| The list of the cluster’s partitions. It’s a set of compute nodes grouped logically |
| The active state of the partition. (up, down, idle) |
| The maximum job execution |
| The total number of nodes per partition. |
|
|
| The list of nodes per partition. |
To request specific numbers of GPUs, you should add your request to your srun/sbatch:
Below is an example of requesting 1 GPU - can request up to 4 GPUs on a single node
--gres=gpu:1 |
SRUN: Interactive Session
Example:
Breakdown:
--gres=gpu:1
: Allocates 1 GPU to your job.--partition=partition_name
: Specifies the partition to run the job in. Replacepartition_name
with the appropriate partition, likescu-gpu
.--time=01:00:00
: Requests 1 hour of runtime. Adjust the time as needed.--mem=8G
: Requests 8 GB of memory.--cpus-per-task=4
: Requests 4 CPU cores.--pty bash
: Launches an interactive bash shell after resources are allocated.
SBATCH: is used to submit a job script for later execution.
The shebang (#!
) at the beginning of a script tells the shell which interpreter to use for executing the commands. In a Slurm script, it specifies that the script should be run using the Bash
In Slurm, lines beginning with #SBATCH
are treated as commands. To comment out a Slurm command, you need to add a second #
at the beginning. For example, #SBATCH
is a command, while ##SBATCH
indicates a comment.
The #SBATCH
lines in the script below contain directives that are recommended as defaults for all job submissions.
Additional flags to add to sbatch script
Submit the batch script
After the job has been submitted, you should get an output similar to the one below but with a different jobid
.
You can use the command below to check the progress of your submitted job in the queue.
syntax: squeue -u <your cwid>
output
Scontrol
syntax: scontrol show jobid <jobid>
output
Terminating Jobs
The scancel command is used to kill or end the current state(Pending, running) of your job in the queue.
Syntax: scancel <jobid>
or skill <jobid>
Or