Cluster nodes are managed through Slurm. Accessing and running jobs on login nodes or by ssh’ing directly to a node is Strictly forbidden.
Commands | Syntax | Description |
---|---|---|
|
| Submit a batch script to SLURM for processing. |
|
| Show information about your job(s) in the queue. The command when run without the |
|
| Run jobs interactively on the cluster. |
|
| End or cancel a queued job. |
|
| Show information about current and previous jobs. |
|
| To check the status of the cluster and partition, including availability, time limits, and the number of node |
...
Code Block |
---|
srun --partition=partition_name --time=01:00:00 -gres=gpu:1scu-cpu --mem=8G --cpus-per-task=4 --pty bash -i |
Breakdown:
--gres=gpu:1
: Allocates 1 GPU to your job.--partition=partition_namescu-cpu
: Specifies the partition to run the job in. Replacepartition_name
with the appropriate partition, likescu-gpu
.--time=01:00:00
: Requests 1 hour of runtime. Adjust the time as needed.--mem=8G
: Requests 8 GB of memory.--cpus-per-task=4
: Requests 4 CPU cores.--pty bash
: Launches an interactive bash shell after resources are allocated.
...
Code Block |
---|
#!/bin/bash #SBATCH --job-name=cpu_job # Job name #SBATCH --output=output_file.txt # Output file #SBATCH --partition=cpu_partition # Partition to run the job (e.g., scu-cpu) #SBATCH --time=01:00:00 # Max runtime (1 hour) #SBATCH --mem=8G # Memory requested #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --output=job_output-%j.out # Standard output file #SBATCH --error=job_error-%j.err # Error output file # Your commands here srun python my_script.py |
Additional flags to add to that can be added to the sbatch script above:
Code Block |
---|
# Request 1 GPU
#SBATCH --gres=gpu:1
# Set email notifications (optional)
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=your_email@example.com |
...