- General Information
- How To Login
- Hardware and Networking
- Running Jobs / Slurm Scheduler
- Slurm Examples & Tips
- Example batch job to run in the partition: common
- Example MPI batch job to run in the partition: common
- To include or exclude specific nodes in your batch script
- Environment variables defined for tasks that are started with srun
- Use $HOME within your script rather than the full path to your home directory
- Copy your data to /tmp to avoid heavy I/O from your nfs mounted $HOME
- Software
General Information
- CUI-HPC is a joint HPC cluster between all departments
- Cluster access is restricted to these CUI-HPC account holders
- Head node: master.hpc.rnd (access via ssh)
- CUIHPC deployment running RHEL 8.x clone
- Scheduler: slurm 20.x
- Current Cluster Status: HPC_Status.
- Home directories are provided for each group from 1 file servers.
- Data is generally NOT backed up (check with your PI for details).
- Please send any questions and report problems to: hpc@cuiatd.edu.pk
How To Login
- To get started, login to the head node master.hpc.rnd via ssh.
- You will be prompted for your CUI account password
- If you are unfamiliar with Linux and ssh, we suggest reading the Linux Tutorial and looking into how to Connect to Linux before proceeding.
- NOTE: Users should not run codes on the head node. Users who do so will be notified and have privileges revoked.
Hardware and Networking
- The head node has 1.8TB local /scratch disk.
- Most compute nodes currently have 1GB connections.
- Pool Hardware technical information.
Queue/Partition | Number of Nodes | Number of Cores/Threads | Node Name | Limits | Group Access |
CUI (default) | 8 | 64/128 | c[1-8] | walltime limit: 8 hours x 5 day | All Groups |
Running Jobs / Slurm Scheduler
CUI HPC Slurm page explains what Slurm is and how to use it to run your jobs. Please take the time to read this page, giving special attention to the parts that pertain to the types of jobs you want to run.
- NOTE: Users should not run codes on the head node. Users who do so will be notified and have privileges revoked.
A few slurm commands to initially get familiar with: sinfo -l scontrol show nodes scontrol show partition Submit a job: sbatch testjob.sh Interactive Job: srun -p common –pty /bin/bash scontrol show job [job id] scancel [job id] squeue -u userid |
Slurm Examples & Tips
NOTE: All lines begining with “#SBATCH” are a directive for the scheduler to read. If you want the line ignored (i.e. a comment), you must place 2 “##” at the beginning of your line.
Example batch job to run in the partition: common
Example sbatch script to run a job with one task (default) in the ‘common’ partition (i.e. queue):
Kindly don’t over request the resources
#!/bin/bash ## -J sets the name of job #SBATCH -J TestJob ## -p sets the partition (queue) #SBATCH -p common ## 10 min #SBATCH –time=00:10:00 ## sets the tasks per core (default=2 for hyperthreading: cores are oversubscribed) ## set to 1 if one task by itself is enough to keep a core busy #SBATCH –ntasks-per-core=1 ## request 4GB per CPU (may limit # of tasks, depending on total memory) #SBATCH –mem-per-cpu=4GB ## define job stdout file #SBATCH -o testcommon-%j.out ## define job stderr file #SBATCH -e testcomon%j.err echo “starting at `date` on `hostname`” # Print the Slurm job ID echo “SLURM_JOB_ID=$SLURM_JOB_ID” echo “hello world `hostname`” echo “ended at `date` on `hostname`” exit 0 |
|
Submit/Run your job:
sbatch example.sh |
View your job:
scontrol show job <job_id> |
Example MPI batch job to run in the partition: common
Example sbatch script to run a job with 60 tasks in the ‘common’ partition (i.e. queue):
#!/bin/bash ## -J sets the name of job #SBATCH -J TestJob ## -p sets the partition (queue) #SBATCH -p common ## 10 min #SBATCH –time=00:10:00 ## the number of slots (CPUs) to reserve #SBATCH -n 60 ## the number of nodes to use (min and max can be set separately) #SBATCH -N 3 ## typically an MPI job needs exclusive access to nodes for good load balancing #SBATCH –exclusive ## don’t worry about hyperthreading, Slurm should distribute tasks evenly ##SBATCH –ntasks-per-core=1 ## define job stdout file #SBATCH -o testcommon-%j.out ## define job stderr file #SBATCH -e testcommon-%j.err echo “starting at `date` on `hostname`” # Print Slurm job properties echo “SLURM_JOB_ID = $SLURM_JOB_ID” echo “SLURM_NTASKS = $SLURM_NTASKS” echo “SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES” echo “SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST” echo “SLURM_JOB_CPUS_PER_NODE = $SLURM_JOB_CPUS_PER_NODE” mpiexec -n $SLURM_NTASKS ./hello_mpi echo “ended at `date` on `hostname`” exit 0 |
|
To include or exclude specific nodes in your batch script
To run on a specific node only, add the following line to your batch script:
#SBATCH -w, –nodelist=c0009 | |
To include one or more nodes that you specifically want, add the following line to your batch script: |
#SBATCH –nodelist=<node_names_you_want_to_include> ## e.g., to include c0006: #SBATCH –nodelist=c0006 ## to include c0006 and c0007 (also illustrates shorter syntax): #SBATCH -w c000[6,7] |
|
To exclude one or more nodes, add the following line to your batch script: |
#SBATCH -exclude=<node_names_you_want_to_exclude> ## e.g., to avoid c0006 through c0008, and c0013: #SBATCH -exclude=c00[06-08,13] ## to exclude c0006 (also illustrates shorter syntax): #SBATCH -x c0006 |
|
Environment variables defined for tasks that are started with srun
If you submit a batch job in which you run the following script with “srun -n $SLURM_NTASKS”, you will see how the various environment variables are defined.
#!/bin/bash echo “Hello from `hostname`,” \ “$SLURM_CPUS_ON_NODE CPUs are allocated here,” \ “I am rank $SLURM_PROCID on node $SLURM_NODEID,” \ “my task ID on this node is $SLURM_LOCALID” |
|
These variables are not defined in the same useful way in the environments of tasks that are started with mpiexec or mpirun. Use $HOME within your script rather than the full path to your home directory In order to access files in your home directory, you should use $HOME rather than the full path . To test, you could add to your batch script: |
echo “my home dir is $HOME” |
Then view the output file you set in your batch script to get the result. |
Copy your data to /tmp to avoid heavy I/O from your nfs mounted $HOME !!!
- We cannot stress enough how important this is to avoid delays on the file systems.
!/bin/bash ##-J sets the name of job #SBATCH -J TestJob ## -p sets the partition (queue) #SBATCH -p common ##time is HH:MM:SS #SBATCH –time=00:01:30 #SBATCH –cpus-per-task=15 ##define job stdout file #SBATCH -o testcommon-%j.out ##define job stderr file #SBATCH -e testcommon-%j.err echo “starting $SLURM_JOBID at date on hostname “echo “my home dir is $HOME” ##copying my data to a local tmp space on the compute node to reduce I/O MYTMP=/tmp/$USER/$SLURM_JOB_ID /usr/bin/mkdir -p $MYTMP || exit $? echo “Copying my data over…” cp -rp $SLURM_SUBMIT_DIR/mydatadir $MYTMP || exit $? ##run your job executables here… echo “ended at date on hostname “echo “copy your data back to your $HOME” /usr/bin/mkdir -p $SLURM_SUBMIT_DIR/newdatadir || exit $? cp -rp $MYTMP $SLURM_SUBMIT_DIR/newdatadir || exit $? ##remove your data from the compute node /tmp space rm -rf $MYTMP exit 0 |
Explanation: /tmp refers to a local directory that is found on each compute node. It is faster to use /tmp because when you read and write to it, the I/O does not have to go across the network, and it does not have to compete with the other users of a shared network drive (such as the one that holds everyone’s /home). To look at files in /tmp while your job is running, you can ssh to the login node, then do a further ssh to the compute node that you were assigned. Then you can cd to /tmp on that node and inspect the files in there with cat or less .Note, if your application is producing 1000’s of output files that you need to save, then it is far more efficient to put them all into a single tar or zip file before copying them into $HOME as the final step. |
Software
To see the information about the available application/module, you can use the following command.
module whatis <module name>
module show <module name>
Software List
Software | Path | Notes |
*GNU Compilers 9.x | /opt/ohpc/pub/compiler/gcc/9.4.0/ | module load gnu9/9.4.0 |
*GROMACS | /opt/ohpc/pub/gromacs-2021.3-hywuxpfkjufnid22gef4wvycpe43k4it | module load gromacs/2021.3 |
openmpi4/4.1.1 | /opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1/ | [Loaded by default] |
parallel python3.1.2 | /opt/ohpc/pub/py-mpi4py-3.1.2-swm7wjc7wnql3dxylxuzlm77nyinjrf6/ | module load py-mpi4py/3.1.2 |
julia 1.6.3 | /opt/ohpc/pub/julia-1.6.3-u26rz45chuhsi7vytaqqg46znrxnkoel/ | module load julia/1.6.3 |
namd-2.13 | /opt/ohpc/pub/namd-2.13-22megr6trb5gs46bzl6oqyrudqkwbjw4/ | module load namd/2.13 |
r-4.1.4 | /opt/ohpc/pub/r-4.1.1-cwmsejamd2xxeaixqm742zmfc3hvtdw5/ | module load R/4.1.1 |
openfoam2106 | /opt/ohpc/pub/openfoam-2106-yhjf542lei2kb6e2mtoi265i5fkhwojf/ | module load openfoam/2106 |
fftw 3.3.8 | /opt/ohpc/pub/libs/gnu9/openmpi4/fftw/3.3.8 | module load fftw/3.3.8 |
octave-6.3.0 | /opt/ohpc/pub/octave-6.3.0-syrc4hme425rtsvqtsnuhplcmst2kk5q/ | module load octave/6.3.0 |
cp2k-8.2 | /opt/ohpc/pub/cp2k-8.2-q47dgtkactxnckft2fiityexymnss4x2 | module load cp2k/8.2 |
Quantum Espresso | /opt/ohpc/pub/quantum-espresso-6.7-i2rfc4d5hb24irmmjqcdzr4khs2kbqgd/ | module load quantum-espresso/6.7 |
mrbayes3.2.7 | /opt/ohpc/pub/mrbayes-3.2.7a-o2f5l7ri4xn7jocn25sopmoqfgsl77fq/ | module load mrbayes/3.2.7a |
Autodock-vina 1.1.2 | /opt/ohpc/pub/autodock-vina-1_1_2-tkpotqcnhhxl3astwnuddifkqjecdtl7/ | module load autodock-vina/1.1.2 |
nwchem 7.0.2 | /opt/ohpc/pub/nwchem-7.0.2-obhytoj5qdiitfso22gbekcejh24c6l6/ | module load nwchem/7.0.2 |
megadock 4.0.3 | /opt/ohpc/pub/megadock-4.0.3-gh7iintjaqs236p4evcdqssq3yjlrtdc/ | module load megadock/4.0.3 |
glimmer 3.02 | /opt/ohpc/pub/glimmer-3.02b-3r6cicsavye6fscdjj5rrvnzl2asvu6n/ | module load glimmer/3.02b |
hmmer 3.3.2 | /opt/ohpc/pub/hmmer-3.3.2-pofbm562w5j4osbulwenj246rjclyowo/ | module load hmmer/3.3.2 |
intel 2022 Compiler | /opt/ohpc/pub/moduledeps/oneapi/compiler/2022.0.2 | module load intel/2022.0.2 |
intel 2021 mpi | /opt/ohpc/pub/moduledeps/oneapi/mpi | module load impi/2021.5.1 |
Siesta 4.0.2 | /opt/ohpc/pub/siesta-4.0.2-hmsyqj3ct6kfpq4vx2ywechlqhqxrmlw/ | module load siesta/4.0.2 |
Octopus 10.5 | /opt/ohpc/pub/octopus-10.5-5qfbbhrl7qqtk26jk6k5gtyq6f32k3q3/ | module load octopus/10.5 |
Grackle 3.1 | /opt/ohpc/pub/grackle-3.1-znzfbcwlxtzqfbbgrlgrpdhuafmmwdds/ | module load grackle/3.1 |
Openmolcas 21.02 | /opt/ohpc/pub/openmolcas-21.02-pzsgie3xdjj62ggdnse2v7b63iaoqjgl/ | module load openmolcas/21.02 |
Qbox 1.63.7 | /opt/ohpc/pub/qbox-1.63.7-orhmhzzgkduekoekhnbflhavlve6uv7k/ | module load qbox/1.63.7 |
Abinit 9.4.2 | /opt/ohpc/pub/abinit-9.4.2-omv7spxmdsaq7lkbkeu7jib6phd3h73t/ | module load abinit/9.4.2 |
Orca 5.0.3 | /opt/ohpc/pub/orca/ | module load orca/5.0.3 |
AbySS 2.3.1 | /opt/ohpc/pub/abyss-2.3.1-p4r55f4zob54w6gkwbowsbcev626nn2i/ | module load abyss/2.3.1 |
citcoms 3.3.1 | /opt/ohpc/pub/citcoms-3.3.1-gtxacpetuyz365eqoebh7rxjx4e47y3a/ | module load citcoms/3.3.1 |
dalton 2020.0 | /opt/ohpc/pub/dalton-2020.0-cizlv73dkdwwdqqbiwupgvhmm6lwowjo/ | module load dalton/2020.0 |
grace 5.1.25 | /opt/ohpc/pub/grace-5.1.25-4ai6dt6bh3ho4utxhbie2vjg3wos3ryw/ | module load grace/5.1.25 |
meep 1.23 | /opt/ohpc/pub/meep-1.23.0-jbslzff4l5hsovc573zaukcxjrqwh2me/ | module load meep/1.23.0 |
plumed 2.6.3 | /opt/ohpc/pub/plumed-2.6.3-vcgk747gbnl7iemvywvptnohlbbuw7as/ | module load plumed/2.6.3 |
ray 2.3.1 | /opt/ohpc/pub/ray-2.3.1-bwvgsiasbbae6x4ywo74odt2p3npjhgq/ | module load ray/2.3.1 |
paraview 5.9.1 | /opt/ohpc/pub/paraview-5.9.1-yhzltojrqzpgc5dawydnbozsi7m7fkjy/ | module load paraview/5.9.1 |
yambo 4.2.2 | /opt/ohpc/pub/yambo-4.2.2-txadlxyhuwauice66rkn5cxgzbqk74wc/ | module load yambo/4.2.2 |
wannier90 3.1.0 | /opt/ohpc/pub/wannier90-3.1.0-sp7tvc4e65c32ditzi4l44c5jnsunfao/ | module load wannier90/3.1.0 |
mrchem 1.0.2 | /opt/ohpc/pub/mrchem-1.0.2-kk73zcps5r26du7tbdvqadv6xcjkbrqh/ | module load mrchem/1.0.2 |
fleur 5.1 | /opt/ohpc/pub/fleur-5.1-ghzomtcqzrwsx4f26gpfntlmtwvgxqlf/ | module load fleur/5.1 |
elk 7.2.42 | /opt/ohpc/pub/elk-7.2.42-ucs4xuwh7z3byqqdl7x2btqtyshgx3jj/ | module load elk/7.2.42 |
berkeleygw 3.0.1 | /opt/ohpc/pub/berkeleygw-3.0.1-vfiyebywqtbj6phcz2jvihqa6omqk42a/ | module load berkeleygw/3.0.1 |
arbor 0.5.2 | /opt/ohpc/pub/arbor-0.5.2-y2tk4tn4isa6gn5sllpq76oyfkvtqbnj/ | module load arbor/0.5.2 |
parmetis 4.0.3 | /opt/ohpc/pub/parmetis-4.0.3-l52lags7s3qytstaxsi3g5pkrty6vzfh/ | module load parmetis/4.0.3 |
tinker 8.7.1 | /opt/ohpc/pub/tinker-8.7.1-4s223pn6x2uzwlivj3ntmnb7wvzmsq7f/ | module load tinker/8.7.1 |
shengbte 1.1.1 | /opt/ohpc/pub/shengbte-1.1.1-8a63749-m6pvjmtvlmgtanlznpcfamkr4jmggynt/ | module load shengbte/1.1.1 |
molden 6.7 | /opt/ohpc/pub/molden-6.7-nw7rogmtgdpk5jorakgkczkdqhewgbh7/ | module load molden/6.7 |
trimmomatic 0.39 | /opt/ohpc/pub/trimmomatic-0.39-gs5zi66bwnjqzprbkilvvvu57qnmgt3u/ | module load trimmomatic/0.39 |
cpmd 4.3 | /opt/ohpc/pub/cpmd-4.3-bbcr6dsvyxhc4foysptgn54bja4alxyd/ | module load cpmd/4.3 |
orca 5.0.3-ompi411 | /opt/ohpc/pub/orca-5.0.3-q27omssb2m4qm7oz2msmxemtxrejc265/ | module load orca/orca-5.0.3-ompi411 |
emboss 6.6 | /opt/ohpc/pub/emboss-6.6.0-arjmvridscw7ygwnpylrmvlh4j4gicj5/ | module load emboss/6.6 |