Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
Name | Mount point | Size | Use | Is backed up? | Comment |
---|---|---|---|---|---|
Home |
| 2Tb | home filesystem. Used to keep small files, configs, codes, scripts, etc | no | have has limited space. It is only used for small files |
Midtier |
| varies per lab | each lab has an allocation under intended for data that is actively being used or processed, research datasets | no | |
AI GPFS |
| 700Tb | tbd | no | Parallel file system for data intensive workloads. Limited access, granted on special requests. |
...
Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.
Important notice
Warning |
---|
Do not run computations on login nodes |
Running your application code directly without submitting it through the scheduler is prohibited. Login nodes are shared resources and they are reserved for light tasks like file management and job submission. Running heavy computations on login nodes can degrade performance for all users. Instead, please submit your compute jobs to the appropriate SLURM queue, which is designed to handle such workloads efficiently.
Batch vs interactive jobs
There are two mechanisms to run SLURM jobs: “batch” and “interactive”. Interactive jobs are an inefficient way to utilize the cluster. By their nature, these jobs require the system to wait for user input, leaving the allocated resources idle during those periods. Since HPC clusters are designed to maximize resource utilization and efficiency, having nodes sit idle while still consuming CPU, memory, or GPU resources is counterproductive.
...