Commands | Syntax | Description |
---|---|---|
|
| Submit a batch script to SLURM for processing. |
|
| Show information about your job(s) in the queue. The command when run without the |
|
| Run jobs interactively on the cluster. |
|
| End or cancel a queued job. |
|
| Show information about current and previous jobs. |
|
| To check the status of the cluster and partition, including availability, time limits, and the number of node |
Requesting Resources
General SciComp Partition for all users on BRB Cluster
scu-cpu: 22 cpu nodes, 7-day runtime limit
scu-gpu: 6 gpu nodes, 2-day runtime limit
Syntax: sinfo
or sinfo --[optional flags]
sinfo
Output: The output below shows a list of the entire partition on the BRB cluster.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST scu-cpu* up 7-00:00:00 18 mix scu-node[023,032-033,035-047,049,079] scu-cpu* up 7-00:00:00 4 alloc scu-node[020-022,034] scu-gpu up 2-00:00:00 4 mix scu-node[050-051,081-082] cryo-cpu up 7-00:00:00 1 idle scu-node065 cryo-cpu up 7-00:00:00 1 idle scu-node002 cryo-cpu up 7-00:00:00 2 mix scu-node[001,064] cryo-cpu up 7-00:00:00 10 idle scu-node[063,066-074] cryo-gpu up 2-00:00:00 6 mix scu-node[003-008] cryo-gpu-v100 up 2-00:00:00 3 mix scu-node[054-056] cryo-gpu-p100 up 2-00:00:00 1 mix scu-node060 cryo-gpu-p100 up 2-00:00:00 2 idle scu-node[061-062] boudker-cpu up 7-00:00:00 1 alloc scu-node010 boudker-cpu up 7-00:00:00 1 idle scu-node009 boudker-gpu up 7-00:00:00 2 mix scu-node[011-012] boudker-gpu-p100 up 7-00:00:00 3 idle scu-node[057-059] accardi-gpu up 2-00:00:00 1 mix scu-node015 accardi-gpu up 2-00:00:00 2 alloc scu-node[013-014] accardi-gpu2 up 2-00:00:00 1 idle scu-node016 accardi-cpu up 7-00:00:00 1 idle scu-node017 sackler-gpu up 7-00:00:00 1 mix scu-node018 sackler-cpu up 7-00:00:00 1 mix scu-node019 hwlab-rocky-cpu up 7-00:00:00 3 idle scu-node[052-053,099] hwlab-rocky-gpu up 7-00:00:00 12 mix scu-node[085-096] scu-res up 7-00:00:00 1 idle scu-login03 eliezer-gpu up 7-00:00:00 1 idle scu-node097
Header | Description |
---|---|
| The list of the cluster’s partitions. It’s a set of compute nodes grouped logically |
| The active state of the partition. (up, down, idle) |
| The maximum job execution |
| The total number of nodes per partition. |
|
|
| The list of nodes per partition. |
To request specific numbers of GPUs, you should add your request to your srun/sbatch:
Below is an example of requesting 1 GPU - can request up to 4 GPUs on a single node
--gres=gpu:1 |
SRUN: Interactive Session
Example:
srun --gres=gpu:1 --partition=partition_name --time=01:00:00 --mem=8G --cpus-per-task=4 --pty bash
Breakdown:
--gres=gpu:1
: Allocates 1 GPU to your job.--partition=partition_name
: Specifies the partition to run the job in. Replacepartition_name
with the appropriate partition, likescu-gpu
.--time=01:00:00
: Requests 1 hour of runtime. Adjust the time as needed.--mem=8G
: Requests 8 GB of memory.--cpus-per-task=4
: Requests 4 CPU cores.--pty bash
: Launches an interactive bash shell after resources are allocated.
SBATCH: submission script
#!/bin/bash #SBATCH --job-name=gpu_job # Job name #SBATCH --output=output_file.txt # Output file #SBATCH --partition=gpu_partition # Partition to run the job (e.g., scu-gpu) #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH --time=01:00:00 # Max runtime (1 hour) #SBATCH --mem=8G # Memory requested #SBATCH --cpus-per-task=4 # Number of CPU cores per task # Your commands here srun python my_script.py