When you wish to use the Sphyrna research cluster, you must create a job and submit it to our job scheduler. The scheduler helps ensure fair access to the HPC cluster by scheduling resources efficiently across the system for simultaneous jobs. If CPU/IO-intensive jobs are not submitted through the job scheduler, they may be terminated.
The job scheduler we use is called Slurm. This software enables us to provide large-but-finite compute resources to a NSU campus research community.
Depending on how you wish to use the cluster, there are two basic categories for jobs:
Getting ready to submit:
Before submitting your job to the Slurm Scheduler, you need to do a bit of planning. This may involve trial-and-error, for which interactive jobs may be helpful. The three most salient variables are as follows:
Example Slurm script:
1.GPU Partition SLURM submission template
#!/bin/bash
#SBATCH --job-name=gpu_job_
#SBATCH --partition=gpu
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=16 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=256G # total memory per node (4 GB per cpu-core is default)
#SBATCH --gres=gpu:2 # number of gpus per node
#SBATCH --time=1-10:00:00 # total run time limit (HH:MM:SS)
#SBATCH --error=gpu_job.%J.err
#SBATCH --output=gpu_job.%J.out
2.CPU Partition SLURM submission template
#!/bin/bash
#SBATCH --job-name=cpu_job_
#SBATCH --partition=cpu
#SBATCH --nodes=4 # node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=16 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=256G # total memory per node (4 GB per cpu-core is default)
#SBATCH --time=1-10:00:00 # total run time limit (HH:MM:SS)
#SBATCH --error=cpu_job.%J.err
#SBATCH --output=cpu_job.%J.out
3.CPU+GPU Mix Partition SLURM submission template
#!/bin/bash
#SBATCH --job-name=mix_job_
#SBATCH --partition=mix
#SBATCH --nodes=5 # node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=16 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=256G # total memory per node (4 GB per cpu-core is default)
#SBATCH --gres=gpu:2 # number of gpus per node
#SBATCH --time=1-10:00:00 # total run time limit (HH:MM:SS)
#SBATCH --error=mix_job.%J.err
#SBATCH --output=mix_job.%J.out
Slurm Command Reference
slurm command |
||
Slurm Command Reference Command |
Purpose |
Example |
sinfo |
View information about Slurm nodes and partitions |
sinfo --partition investor |
squeue |
View information about jobs |
squeue -u myname |
sbatch |
Submit a batch script to Slurm |
sbatch myjob |
scancel |
Signal or cancel jobs, job arrays or job steps |
scancel jobID |
srun |
Run an interactive job |
srun --ntasks 4 --partition investor --pty bash |