User Requests
Sometimes, users may request access to additional software packages, more disk quota, or longer SLURM job time limits. This document outlines the process of handling such requests.
Prior to performing changes for the request, WATcloud admins should understand the request fully and consider the implications of the change. The following questions are examples of what the admins should consider:
Does the request make use of cluster resources in a fair and efficient manner?
For example, if a user requests more disk quota, the WATcloud admin should consider whether the type of data the user is storing is appropriate for the storage location. For instance, long-term storage of large datasets on SSD-backed storage may not be the most efficient use of resources.
Can the request be fulfilled without the help of WATcloud admins?
Sometimes, the requested work can be done by the users themselves. The WATcloud admin should point the user to the relevant documentation or resources, and improve the documentation if necessary. For example, if a user requests a software package that can be installed in user-space (e.g. Conda), the WATcloud admin should point the user to some options (e.g. miniconda (opens in a new tab)).
In subsequent sections, we will discuss specific user requests and how to handle them.
SLURM Job Time Limit Extension
Users may request an extension to the time limit of their SLURM1 jobs. The time limits are in place to ensure that the cluster remains maintainable. For example, if the maximum time limit jobs is 7 days, then in the worst case scenario, we need to wait 7 days for a node to be drained before performing maintenance2 on it.
If a user requests an extension to the time limit and the request does not conflict with upcoming maintenance, then the WATcloud admin can extend the time limit as follows:
scontrol update job=<job_id> TimeLimit=<new_time_limit>
Software Package Installation
Users may request the installation of software packages that are not already available on the cluster. If the software package is required to be installed system-wide and the demand for the software outweighs the cost (in terms of storage space and maintenance complexity) of installing the software, then the WATcloud admin can add the software package to the cluster Ansible configuration3.
Footnotes
-
See the SLURM documentation for more information on the use of SLURM in the WATcloud cluster. ↩
-
See the Maintenance Manual for information on performing maintenance on SLURM nodes. ↩
-
At the time of writing, the Ansible configuration can be found here (opens in a new tab). Note that removing software packages is a separate step here (opens in a new tab). ↩