Tips, tricks, and being a good HPC citizen

Being a good HPC citizen

Number of jobs

Please do not flood the queues with a large number of jobs at the same time.

The FairShare algorithm will generally ensure that every user and group gets the appropriate share of the computer by reducing the priority of jobs belonging to users/groups who have used more than their share in the recent past. It does however take time to adjust, so submitting a large number of jobs may block the cluster for other users. One way of preventing your jobs from blocking too many nodes is to queue them up behind each other by creating a dependency. If you add the following line to the SLURM directives:

#SBATCH -d afterany:<previous-job>

this job will not run before the job with the number <jobid> is finished. You can also add the dependency after the job is submitted (obviously only if it has not started yet) with the scontrol command:

scontrol update jobid=<job-to-be-delayed> dependency=afterany:<previous-job>

In both cases, replace <previous-job> and <ob-to-be-delayed> with their jobid (the number in the first columnn shown by squeue).

Disk usage

Home directories

The home directories are backed up, but our backup capacity is limited. We therefore ask that only essential and irreplacable data, such as source code or input files, are stored there. There is currently no enforcement of quota, but if your data exceeds 20GB, it will not be backed up.

Scratch directories

Calculations creating large amounts of data should be run in the scratch filesystem. You will have a directory at /scratch/<group>/<user>. This is not backed up, and it is the user's responsibility to secure their data elsewhere. In practice, filespace there is also limited and some groups may be close to filling up their partition. kennedy is not meant for long-term storage, so please regularly delete data that is not needed on kennedy anymore.