Page Comparison

Table of Contents

minLevel	2
maxLevel	6
outline	false
style	decimal
type	list
printable	true

...

Name

Mount point

Size

Use

Is backed up?

Comment

Home

/home

2Tb

home filesystem. Used to keep small files, configs, codes, scripts, etc

no

have has limited space. It is only used for small files

Midtier

/midtier/<labname>

varies per lab

each lab has an allocation under/midtier/<labname>/scratch/<cwid>

intended for data that is actively being used or processed, research datasets

no

AI GPFS

/bhii

700Tb

tbd

no

Parallel file system for data intensive workloads. Limited access, granted on special requests.

...

Access to applications is managed with modules. Refer to <placeholder> for detailed tutorial on modules but here is a quick list of commands that can be used on the AI cluster:

Code Block

language	bash

# list all the available modules:
module avail
# list currently loaded modules:
module load <module_name>
# unload the module:
module unload <module_name>
# swap versions of the application
module swap <module_name>/<version1> <module_name>/<version2>
# unload all modules
module purge
# get help
module help
# get more info for a particular module
module help <module_name>

...

Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.

Important notice

Warning
Do not run computations on login nodes

Running your application code directly without submitting it through the scheduler is prohibited. Login nodes are shared resources and they are reserved for light tasks like file management and job submission. Running heavy computations on login nodes can degrade performance for all users. Instead, please submit your compute jobs to the appropriate SLURM queue, which is designed to handle such workloads efficiently.

Batch vs interactive jobs

There are two mechanisms to run SLURM jobs: “batch” and “interactive”. Interactive jobs are an inefficient way to utilize the cluster. By their nature, these jobs require the system to wait for user input, leaving the allocated resources idle during those periods. Since HPC clusters are designed to maximize resource utilization and efficiency, having nodes sit idle while still consuming CPU, memory, or GPU resources is counterproductive.

...

Version	Old Version 38	New Version Current
Changes made by	eud4002	eud4002
Saved on	Jan 30, 2025	Jan 31, 2025

Versions Compared

Key

Important notice

Batch vs interactive jobs