Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.
Important notice
...
Warning |
---|
Do not run computations on login nodes |
...
Stopping and monitoring SLURM jobs
To stop (cancel) a SLURM job use
Code Block | ||
---|---|---|
| ||
scancel <job_id> |
Once the job is running, there are a few tool that can help monitoring the status. Again, refer to <placeholder> for detailed SLURM tutorial, but here is a list of some useful commands:
Code Block | ||
---|---|---|
| ||
# show status of the queue
squeue -l
# only list jobs by a specific user
squeue -l -u <cwid>
# print partitions info
sinfo
# print detailed info about a job
scontrol show job <job id>
# print detailed info about a job
scontrol show node <node_name>
# get a list of all the jobs executed within last 7 days:
sacct -u <cwid> -S $(date -d "-7 days" +%D) -o "user,JobID,JobName,state,exit" |