Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel2
maxLevel6
outlinefalse
styledecimal
typelist
printabletrue

...

Code Block
languagebash
# List all files and directories in the scratch directory
ls /midtier/labname/scratch/

# Navigate to a specific subdirectory
cd /midtier/labname/scratch/cwid

# Copy a file from the current directory to another directory
cp data.txt /midtier/labname/scratch/cwid/

# Move the copied file to a different directory
mv /midtier/labname/scratch/cwid/data.txt /midtier/labname/scratch/backup/

# Create a new directory
mkdir /midtier/labname/scratch/cwid/new_project/

Running jobs

Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.

Important notice:

Warning

Do not run computations on login nodes

Running your application code directly without submitting it through the scheduler is prohibited. Login nodes are shared resources and they are reserved for light tasks like file management and job submission. Running heavy computations on login nodes can degrade performance for all users. Instead, please submit your compute jobs to the appropriate SLURM queue, which is designed to handle such workloads efficiently.

There are two mechanisms to run SLURM jobs: “batch” and “interactive”. Interactive jobs are an inefficient way to utilize the cluster. By their nature, these jobs require the system to wait for user input, leaving the allocated resources idle during those periods. Since HPC clusters are designed to maximize resource utilization and efficiency, having nodes sit idle while still consuming CPU, memory, or GPU resources is counterproductive.

For example:

  • If you're running an interactive session and step away or take time to analyze output, the allocated resources remain reserved but unused.

  • This idle time adds up across multiple users, leading to significant underutilization of the cluster.

Because of these considerations we highly recommend that users execute as much of their computations in the batch mode so that WCM’s research community can make the most of the cluster's capabilities.

Code example

To illustrate how to run computational jobs, consider the following toy problem, that is implemented in C. It estimates value of π using a random sampling method:

...

it will generate an executable a.out that we will use to illustrate how to submit jobs

Running jobs

Computational jobs on the AI cluster are managed with a SLURM job manager. We provide an in-depth tutorial on how to use SLURM <placeholder>, but some basic examples that are immediately applicable on the AI cluster will be discussed in this section.

Important notice:

Warning

Do not run computations on login nodes

Running your application code directly without submitting it through the scheduler is prohibited. Login nodes are shared resources and they are reserved for light tasks like file management and job submission. Running heavy computations on login nodes can degrade performance for all users. Instead, please submit your compute jobs to the appropriate SLURM queue, which is designed to handle such workloads efficiently.

There are two mechanisms to run SLURM jobs: “batch” and “interactive”. Interactive jobs are an inefficient way to utilize the cluster. By their nature, these jobs require the system to wait for user input, leaving the allocated resources idle during those periods. Since HPC clusters are designed to maximize resource utilization and efficiency, having nodes sit idle while still consuming CPU, memory, or GPU resources is counterproductive.

For example:

  • If you're running an interactive session and step away or take time to analyze output, the allocated resources remain reserved but unused.

  • This idle time adds up across multiple users, leading to significant underutilization of the cluster.

Because of these considerations we highly recommend that users execute as much of their computations in the batch mode so that WCM’s research community can make the most of the cluster's capabilities.

SLURM Batch job example

Here is an example of the batch script that can be ran on the cluster. In this script we are requesting a single node, with 4 CPU cores, used in an SMP mode.

...