Compute nodes will no longer be shared among multiple users. Instead, when a user grabs a compute node, they will be the only user allowed to access it. This is being implemented for security reasons, as well as performance reasons. If multiple users are sharing the same node, performance can be negatively impacted due to resource contention. While we will no longer be scheduling jobs from different users on the same node, users are encouraged to take advantage of tools such as GNU parallel to co-schedule their multiple independent tasks on the compute nodes allocated to them.
Each user will be limited to 10 active jobs at a given point in time and will be limited to running these jobs on a maximum of 20 compute nodes. As each compute node is dual-socket, and has a 20-core processors on each socket, a total of 800 cores could be potentially used by a job at a given point in time.
Each job will be limited to a run time of no more than 72 hours. Users are encouraged to consider implementing checkpointing-restart capabilities in their home-grown applications. The research computing support group will be happy to provide guidance on implementing checkpointing-restart mechanism in the users' code. Some third-party software, like the FLASH astrophysics code, already have in-built capabilities to checkpoint-restart. Such capabilities can be enabled by setting the required environment variables. The users are encouraged to review the documentation of their software to confirm whether or not the checkpoint-restart functionality is available in the software of their choice.
Exceptions: If you require access to nodes for a longer period of time, or need access to more nodes that what are allowed by default, please contact us atRCSG@utsa.eduwith an exemption request. We will need a brief description of the activity for your request, along with the number of cores and nodes required, and the time duration for which you are requesting the exemption. Also, please inform us if you have checkpoints built into your code or applications, so your jobs will restart automatically if paused.
Data Storage (Disk Usage)
Work Directory – as detailed in our Wiki, this directory is where you should place any input/output files as well as logs for your running jobs. This directory is NOT backed up and is not intended for long-term storage.
Work Directory Data Retention – Effective August 13, 2020, all files in the Work directory that have not been accessed in the last 30 days will be likely candidates for deletion.
Home Directory – this directory is backed up but should only be used for installing and compiling code. Storage of datasets is permitted here, but there will be a hard quota limit of 25GB in place.