Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel2
maxLevel6
outlinefalse
styledecimal
typelist
printabletrue

...

Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.

Important notice

...

Warning

Do not run computations on login nodes

...

Stopping and monitoring SLURM jobs

To stop (cancel) a SLURM job use

Code Block
languagebash
scancel <job_id>

Once the job is running, there are a few tool that can help monitoring the status. Again, refer to <placeholder> for detailed SLURM tutorial, but here is a list of some useful commands:

Code Block
languagebash
# show status of the queue
squeue -l                      
# only list jobs by a specific user
squeue -l -u <cwid>            
# print partitions info
sinfo                          
# print detailed info about a job
scontrol show job <job id>     
# print detailed info about a job
scontrol show node <node_name> 
# get a list of all the jobs executed within last 7 days:
sacct -u <cwid> -S $(date -d "-7 days" +%D) -o "user,JobID,JobName,state,exit"