Page Comparison

Table of Contents

outline	true

What is Slurm?

Slurm (previously Simple Linux Utility for Resource Management), is a modern, open source job scheduler that is highly scaleable and customizable; currently, Slurm is implemented on the majority of the TOP500 supercomputers. Job schedulers enable large numbers of users to fairly and efficiently share large computational resources.

Cluster prerequisites

Before being able to take advantage of our computational resources, you must first set up your environment. This is pretty straightforward, but there are a few steps:

SSH access setup

You need to have your SSH keys set up to access cluster resources. If you haven't done this already, please set up your ssh keys.

Warning
Ensure SSH keys are configured for proper access to the Slurm submit host, curie.pbtech

SCU clusters and job partitions

Available SCU HPC resources

The SCU uses Slurm to manage the following resources:

General purpose cluster:

The panda cluster (72 nodes): 70 CPU-only cluster intended for general use

PI-specific clusters:

The Edison cluster (9 GPU nodes): 5 k40m and 4 k80 nodes reserved for the H. Weinstein lab

All jobs, except those submitted the Edison cluster, should be submitted via our Slurm submission node: curie.pbtech. Jobs submitted to the Edison cluster should be submitted from its submission node, edison-mgmt.pbtech.

Warning
Note: Unless you perform cryoEM analysis, or otherwise have specific PI-granted privileges, you will only be able to submit jobs to the `panda` cluster.

Please see About SCU for more information about our HPC infrastructure.

Slurm partitions - Greenberg Cluster

Slurm groups nodes into sets referred to as 'partitions'. The above resources belong to one or more Slurm partitions, with each partition possessing its own unique job submission rules. Some nodes belong to multiple partitions because this affords the SCU the configurational flexibility needed to ensure fair allocation of managed resources.

Greenberg

Panda cluster partitions:

panda: 70 CPU-only nodes, 7-day runtime limit

CryoEM cluster:

cryo-cpu: 15 CPU-only nodes, 2-day runtime limit
cryo-gpu 4 GPU nodes (P100), 2-day runtime limit

Edison cluster:

edison: 9 GPU nodes, 2-day runtime limit
edison_k40m: 5 GPU (k40m) nodes, 2-day runtime limit
edison_k80: 4 GPU (k80) nodes, 2-day runtime limit

PI-specific cluster partitions:

...

boudker_reserve: node179, GPU (P100) node, 7-day runtime limit

...

Table of Contents

outline	true

...

What is Slurm?

Slurm (previously Simple Linux Utility for Resource Management), is a modern, open source job scheduler that is highly scaleable and customizable; currently, Slurm is implemented on the majority of the TOP500 supercomputers. Job schedulers enable large numbers of users to fairly and efficiently share large computational resources.

...

Cluster prerequisites

Before being able to take advantage of our computational resources, you must first set up your environment. This is pretty straightforward, but there are a few steps:

SSH access setup

You need to have your SSH keys set up to access cluster resources. If you haven't done this already, please set up your ssh keys.

Warning
Ensure SSH keys are configured for proper access to the Slurm submit host, curie.pbtech

...

SCU clusters and job partitions

Available SCU HPC resources

The SCU uses Slurm to manage the following resources:

General purpose cluster:

The panda cluster (72 nodes): 70 CPU-only cluster intended for general use

PI-specific clusters:

The Edison cluster (9 GPU nodes): 5 k40m and 4 k80 nodes reserved for the H. Weinstein lab

All jobs, except those submitted the Edison cluster, should be submitted via our Slurm submission node: curie.pbtech. Jobs submitted to the Edison cluster should be submitted from its submission node, edison-mgmt.pbtech.

Warning
Note: Unless you perform cryoEM analysis, or otherwise have specific PI-granted privileges, you will only be able to submit jobs to the `panda` cluster.

Please see About SCU for more information about our HPC infrastructure.

Slurm partitions - Greenberg Cluster

Slurm groups nodes into sets referred to as 'partitions'. The above resources belong to one or more Slurm partitions, with each partition possessing its own unique job submission rules. Some nodes belong to multiple partitions because this affords the SCU the configurational flexibility needed to ensure fair allocation of managed resources.

Greenberg

Panda cluster partitions:

panda: 70 CPU-only nodes, 7-day runtime limit

Edison cluster:

edison: 9 GPU nodes, 2-day runtime limit
edison_k40m: 5 GPU (k40m) nodes, 2-day runtime limit
edison_k80: 4 GPU (k80) nodes, 2-day runtime limit

Slurm commands can only be run on the slurm submission host, curie.pbtech. (Greenberg)

...

SCU cluster partitions:

scu-cpu: 28 22 cpu nodes, 7-day runtime limit
scu-gpu: 5 6 gpu nodes, 2-day runtime limit

...

cryo-cpu: 14 CPU-only nodes, 7-day runtime limit
cryo-gpu: 6 GPU nodes, 2-day runtime limit
cryo-gpu-v100: 3 2 GPU, 2-day runtime limit
cryo-gpu-p100: 3 GPU, 2-day runtime limit

PI-specific cluster partitions:

accardi_gpu: 4 GPU nodes, 2-day runtime lim
accardi_cpu: 1 CPU node, 7-day runtime limit
boudker_gpu: 2 GPU nodes, 7-day runtime limit
boudker_gpu-p100: 3 GPU nodes, 7-day runtime limit
boudker_cpu: 2 CPU nodes, 7-day runtime limit
sackler_ cpu: 1 CPU node, 7-day runtime limit
sackler_ gpu: 1 GPU node, , 7-day runtime limit
hwlab-rocky_gpu: 12 GPU nodes, 7-day runtime limit
sackler_ eliezer-gpu: 2 1 GPU node, 7-day runtime limit

Other specific cluster partitions:

covid19scu-res: 1 CPUGPU, 7-day runtime limit

Of course, the above will be updated as needed; regardless, to see an up-to-date description of all available partitions, using the command sinfo scu-login02. For a description of all the nodes' # CPU cores, memory (in Mb), runtime limits, and partition, use this command:

...

Code Block
srun -n1 --pty --partition=scu-cpu --mem=8G bash -i

To request specific numbers of GPUs, you should add your request to your srun/sbatch:

Below is an example of requesting 1 GPU - can request up to 4 GPUs on a single node

Code Block
--gres=gpu:1

...

A simple job submission example

...

Versions Compared

Old Version 47

New Version 57

Key

What is Slurm?

Cluster prerequisites

SSH access setup

SCU clusters and job partitions

Available SCU HPC resources

Slurm partitions - Greenberg Cluster

What is Slurm?

Cluster prerequisites

SSH access setup

SCU clusters and job partitions

Available SCU HPC resources

Slurm partitions - Greenberg Cluster

A simple job submission example