High performance computing
High performance computing/computers (HPCs) are networks of computing servers (often called ‘clusters’) designed to provide a shared resource for computationally intensive programs that users can borrow as needed. Where an individual may be limited by the hardware of their laptop or the budget of their lab, the HPC is free to use and users can request very large amounts of RAM, CPUs, GPUs, disk space, etc. HPCs use job handling software such as SLURM or Sun Grid Engine (SGE) to manage user’s requests and allocate resources. Compute jobs are submitted to the cluster using an .sbatch
script to register the job with SLURM. HPCs are great for running expensive processing scripts, but you can also run interactive sessions and GUI notebook servers like RStudio and Jupyter with high RAM and quick file access.
Getting access to the HPC
Users can use the Sci-comp self-service portal to request access to the cluster.
Interactive sessions on HPC
Screen + grabnode allows a persistent, long-term interactive session.
## use screen or something similar to avoid crashing your session on disconnect
screen
## grab a node for interactive use. Follow the interactive prompts
grabnode
Both RStudio and JupyterLab can be run on the HPC and accessed from your local computer browser, allowing you to leverage the HPC resources and connect to the shared drives.
Jupyter
Recommended: Use the FHIL Jupyter Lab Launcher
The FHIL Jupyter Lab Launcher provides a streamlined way to run Jupyter Lab on the HPC. This allows you to configure your own resource allocation and control the system libraries with greater permissions (by editing the image definition).
# Clone the repository
git clone https://github.com/Fred-Hutch-Innovation-Lab/jupyter-lab-launcher.git
cd jupyter-lab-launcher
# Submit the job (automatically pulls container and sets up networking)
sbatch launch_jupyter_lab.sh
Alternative: Traditional module-based approach
You can also use Rhino modules with Jupyter installed, or set up your own environment with UV. This is fine but a little sloppier for recording the environment used, managing screen sessions, etc.
## use screen or something similar to avoid crashing your session on disconnect
screen
## grab a node for interactive use. Follow the interactive prompts
grabnode
## with UV
uv add jupyter
uv run --with jupyter jupyter lab --ip=$(hostname) --port=$(fhfreeport) --no-browser
## with modules
ml JupyterLab/4.0.3-GCCcore-12.2.0 Seaborn/0.12.2-foss-2022b scikit-learn/1.2.1-gfbf-2022b
jupyter lab --ip=$(hostname) --port=$(fhfreeport) --no-browser
RStudio
Recommended: Use the FHIL RStudio Server Launcher
The FHIL RStudio Server Launcher provides a streamlined way to run RStudio Server on the HPC. This allows you to configure your own resource allocation and control the system libraries with greater permissions (by editing the image definition).
# Clone the repository
git clone https://github.com/Fred-Hutch-Innovation-Lab/rstudio-server-launcher.git
cd rstudio-server-launcher
# Submit the job (automatically pulls container and sets up networking)
sbatch launch_rstudio_server.sh
Alternative: FH rstudio launcher
This will be faster, but has more limits on the system libraries you can install. Also, if you want to get a reproducible compute environment for publishing, you’ll have to contact scicomp and ask them for the details (probably not that hard, but I prefer to just build it myself).
- Rstudio server launcher (legacy)