Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 12 Next »

Overview

In collaboration with Dr. Mert Sabuncu from Radiology, the ITS team has established the framework for a new high-performance computing (HPC) cluster dedicated to AI/ML type workflows, like training neural networks for imaging, LLMs and so on.

This cluster features high-memory nodes, Nvidia GPU servers (A100, A40 and L40), InfiniBand interconnect, and specialized storage designed for AI workloads.

Login to the AI cluster

AI cluster is accessible via terminal SSH sessions. You need to be connecting from the WCM network, or have VPN installed and enabled. Replace <cwid> with your credentials.

ssh <cwid>@ai-login01.med.cornell.edu
# or
ssh <cwid>@ai-login02.med.cornell.edu

Once logged on:

Last login: Fri Jan  3 11:35:53 2025 from 157.000.00.00
<cwid>@ai-login01:~$
<cwid>@ai-login01:~$ pwd
/home/<cwid>
<cwid>@ai-login01:~$ 

Storage

AI cluster has the following storage systems configured:

Name

Mount point

Use

Is backed up?

Comment

Home

/home

home filesystem. Used to keep small files, configs, codes, scripts, etc

no

have limited space. It is only used for small files

Midtier

/midtier/<labname>

each lab has an allocation under/midtier/<labname>/scratch/<cwid>

intended for data that is actively being used or processed, research datasets

no

AI GPFS

/bhii

tbd

no

Parallel file system for data intensive workloads. Limited access, granted on special requests.

Common File Management

# List all files and directories in the scratch directory
ls /midtier/labname/scratch/

# Navigate to a specific subdirectory
cd /midtier/labname/scratch/cwid

# Copy a file from the current directory to another directory
cp data.txt /midtier/labname/scratch/cwid/

# Move the copied file to a different directory
mv /midtier/labname/scratch/cwid/data.txt /midtier/labname/scratch/backup/

# Create a new directory
mkdir /midtier/labname/scratch/cwid/new_project/

Running jobs

SLURM Batch job example

SLURM interactive job example

Jupyter job

Monitoring job status

  • No labels