HPC Usage
Last updated
Was this helpful?
Last updated
Was this helpful?
Do NOT run compute-heavy jobs on login nodes.
Run a small, test-scale job first before submitting large batch jobs.
Be respectful to others' computing resources; if you have thousands of small jobs, use a Job Array.
HPC Quick Start Guide:
Research HPC Support Request:
Cedars Tutorial:
Department of Computational Biomedicine's guide: [Excellent resource; updated perodically]
As of November 2022, there are two HPCs in Cedars-Sinai. The old HPC was built in 2013 and named as Cisco; the new HPC was built in 2022 and named as HPE.
Cisco cluster has decent CPU power and RAMs. You should consider using it if your job is typical bioinformatic workflows, such as STAR alignment, RNA-seq processing etc.
HPE is a new cluster built in 2022, managed by Slurm, and is equipped with cutting-edge GPUs (2x A100 belongs our group, 8x A100 and 4x V100 of shared resources).
Read below to learn how to best navigate these two clusters.
A typical HPC is an aggregation of several computers called nodes. Each node has it's own CPU and RAM, but storage is shared across all nodes on the same cluster (i.e regardless of which node you are logged into, all of your directories will be the same). Groups of nodes are dedicated for a particular use case. There are three primary types of nodes on the Cedars HPC. Only Transfer Nodes at Cedars HPC have a high internet speed, so be sure to use them for transferring large data.
Submit Node: Logging in, submitting jobs, requesting interactive nodes; no heavy compute.
on Cisco: csclprd3-s00[1,2,3]v.csmc.edu
on HPE: esplhpccompbio-lv0[1,3].csmc.edu
Transfer Node: Downloading and transferring large files.
on HPE: hpc-transfer01.csmc.edu
Compute Node: Workhorse for execution of programs and data analysis.
on Cisco: csclprd3-c[XXX]
on HPE: esplhpc-cp[XX]
The storage in HPC is called Network Attached Storage, or NAS. They are shared (or mounted/attached to each node) across all nodes on the same cluster. The only storage that is shared across the two different clusters (2013 Cisco and 2022 HPE) would be the \common
folder (which all users will have a shortcut named ~/common
that points to their /common/
folder).
A fast scratch storage of 2TB per user is available on 2013 Cisco cluster at /scratch/username
. These have faster I/O speeds, therefore a suitable for short turnaround experiments. However, note that any files older than 7 days will be automatically deleted in scratch!!
Cisco cluster has decent CPU power and RAMs. You should consider using it if your job is typical bioinformatic workflows, such as STAR alignment, RNA-seq processing etc.
Add the following lines to your $HOME/.ssh/config
Then to ssh, use ssh csmc
in your terminal. You need to be on Cedars intranet or VPN.
HPE is a new cluster built in 2022, managed by Slurm, and is equipped with cutting-edge GPUs (8x V100). However, it can be crowded at times if someone submits too many jobs.
In general you can search for specific usages on Google, such as qsub, qrsh, qstat. These resources are applicable to most SGE (sun grid engine) and/or managed job systems. Below I provide a few commands for day-to-day usages:
Getting an interactive node
On HPE, put these in your .bash_aliases
:
alias salloc-gpu="salloc --gpus=v100:1 --time=1-0 --mem=8g"
alias salloc-cpu="salloc -c=8 --time=1-0 --mem=8g"
On Cisco: alias qrsh-cpu="qrsh -l h_rt=24:00:00,h_mem=8g"
Submit a CPU/GPU job
On HPE, use the following template
For Cedars HPC documentation, see below:
A lab-shared storage partition can be requested by PIs for their group members. Right now, the Zhang lab has a shared storage of 90TB point to /common/zhangz2lab/
. This is also shared between the two clusters. For convenience, you can it to your home directory by ln -s /common/zhangz2lab $HOME/zhanglab
.
Tip: since /common/
is shared between two clusters, you can install your local in this folder, and have your .bashrc
point to the local folder, so that you will have the same working environments between two HPCs.
Generally, SSH (see the section below) is used to login to Submit Node. SSH is by default available in Linux/MacOS terminals, and can be installed by or on Windows.
Once you are on the Submit Nodes, you can submit jobs and ask for resource allocations using the Job Management systems (see the section following SSH).
(you need to be on Cedars intranet)
Setting up VSCode to work with Remote server:
Requesting more lab-wise network-attached storage at Cedars HPC [only needed if our current storage space is not enough and/or you have large files incoming; ask Frank if you are not sure]: