Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Commands

Syntax

Description

sbatch

sbatch <job-id>

Submit a batch script to SLURM for processing.

squeue

squeue -u cwid

Show information about your job(s) in the queue. The command when run without the -u flag, shows a list of your job(s) and all other jobs in the queue.

srun

srun <resource-parameters>

Run jobs interactively on the cluster.

scancel

scancel <job-id>

End or cancel a queued job.

sacct

sacct

Show information about current and previous jobs.

sinfo

sinfo

To check the status of the cluster and partition, including availability, time limits, and the number of node

Requesting Resources

General SciComp Partition for all users on BRB Cluster

  • scu-cpu: 22 cpu nodes, 7-day runtime limit

  • scu-gpu: 6 gpu nodes, 2-day runtime limit

image-20241009-162722.png

Syntax: sinfo or sinfo --[optional flags]

sinfo

Output: The output below shows a list of the entire partition on the BRB cluster.

PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST
scu-cpu*             up 7-00:00:00     18    mix scu-node[023,032-033,035-047,049,079]
scu-cpu*             up 7-00:00:00      4  alloc scu-node[020-022,034]
scu-gpu              up 2-00:00:00      4    mix scu-node[050-051,081-082]
cryo-cpu             up 7-00:00:00      1   idle scu-node065
cryo-cpu             up 7-00:00:00      1   idle scu-node002
cryo-cpu             up 7-00:00:00      2    mix scu-node[001,064]
cryo-cpu             up 7-00:00:00     10   idle scu-node[063,066-074]
cryo-gpu             up 2-00:00:00      6    mix scu-node[003-008]
cryo-gpu-v100        up 2-00:00:00      3    mix scu-node[054-056]
cryo-gpu-p100        up 2-00:00:00      1    mix scu-node060
cryo-gpu-p100        up 2-00:00:00      2   idle scu-node[061-062]
boudker-cpu          up 7-00:00:00      1  alloc scu-node010
boudker-cpu          up 7-00:00:00      1   idle scu-node009
boudker-gpu          up 7-00:00:00      2    mix scu-node[011-012]
boudker-gpu-p100     up 7-00:00:00      3   idle scu-node[057-059]
accardi-gpu          up 2-00:00:00      1    mix scu-node015
accardi-gpu          up 2-00:00:00      2  alloc scu-node[013-014]
accardi-gpu2         up 2-00:00:00      1   idle scu-node016
accardi-cpu          up 7-00:00:00      1   idle scu-node017
sackler-gpu          up 7-00:00:00      1    mix scu-node018
sackler-cpu          up 7-00:00:00      1    mix scu-node019
hwlab-rocky-cpu      up 7-00:00:00      3   idle scu-node[052-053,099]
hwlab-rocky-gpu      up 7-00:00:00     12    mix scu-node[085-096]
scu-res              up 7-00:00:00      1   idle scu-login03
eliezer-gpu          up 7-00:00:00      1   idle scu-node097

Header

Description

PARTITION

The list of the cluster’s partitions. It’s a set of compute nodes grouped logically

AVAIL

The active state of the partition. (up, down, idle)

TIMELIMIT

The maximum job execution walltime per partition.

NODES

The total number of nodes per partition.

STATE

mix Only part of the node is allocated to one or more jobs and the rest in an Idle state.

allocThe entire resource on the node(s) is being utilized

idleThe node is in an idle start and has none of it’s resources being used..

NODELIST(REASON)

The list of nodes per partition.

To request specific numbers of GPUs, you should add your request to your srun/sbatch:  

Below is an example of requesting 1 GPU - can request up to 4 GPUs on a single node

--gres=gpu:1

SRUN: Interactive Session

Example:

srun --gres=gpu:1 --partition=partition_name --time=01:00:00 --mem=8G --cpus-per-task=4 --pty bash

Breakdown:

  • --gres=gpu:1: Allocates 1 GPU to your job.

  • --partition=partition_name: Specifies the partition to run the job in. Replace partition_name with the appropriate partition, like scu-gpu.

  • --time=01:00:00: Requests 1 hour of runtime. Adjust the time as needed.

  • --mem=8G: Requests 8 GB of memory.

  • --cpus-per-task=4: Requests 4 CPU cores.

  • --pty bash: Launches an interactive bash shell after resources are allocated.

SBATCH: submission script

#!/bin/bash
#SBATCH --job-name=gpu_job        # Job name
#SBATCH --output=output_file.txt  # Output file
#SBATCH --partition=gpu_partition # Partition to run the job (e.g., scu-gpu)
#SBATCH --gres=gpu:1              # Request 1 GPU
#SBATCH --time=01:00:00           # Max runtime (1 hour)
#SBATCH --mem=8G                  # Memory requested
#SBATCH --cpus-per-task=4         # Number of CPU cores per task

# Your commands here
srun python my_script.py
  • No labels