WebHome < WebDocumentation

You are here: Foswiki>WebDocumentation Web>WebHome (03 Apr 2025, AdminUser)Edit Attach

Arc User-Guide

Arc is the primary High Performance Computing (HPC) system at The University of Texas at San Antonio (UTSA) that can be used for running data-intensive, memory-intensive, and compute-intensive jobs from a wide range of disciplines.

1. Arc is equipped with:

172 total compute/GPU nodes and 2 login nodes, majority of these are Intel Cascade Lake CPUs and some are AMD EPYC CPUs
22 GPU nodes - each containing two Intel CPUs with 20-cores each for a total of 40 cores, one V100 Nvidia GPU accelerator and 384GB of RAM
9 GPU nodes - each containing two Intel CPUs with 20-cores each for a total of 40 cores, two V100 Nvidia GPU accelerators and 384GB of RAM
3 NVIDIA DGX nodes - each having two AMD EPYC CPUs with 64-cores each for a total of 128 cores, eight Nvidia A100 80GB GPUs and 2TB of RAM
2 GPU nodes - each containing two Intel CPUs with 20-cores each for a total of 40 cores, four V100 Nvidia GPU accelerators and 384GB of RAM
2 GPU nodes - each having two AMD EPYC CPUs with 8-cores each for a total of 16 cores, one Nvidia A100 40GB GPU and 1 TB RAM
2 GPU nodes - each having two Intel CPUs with 24-cores each for a total of 48 cores, four Nvidia H100 80GB GPUs and 1TB of RAM
2 CPU nodes - each having two Intel CPUs with 24-cores each for a total of 48 cores, four Nvidia L40S 48GB GPUs and 1TB of RAM
2 large-memory nodes, each containing four CPUs with 20-cores each for a total of 80 cores, and each including 1.5TB of RAM
1 large-memory node, equipped with two AMD EPYC CPUs with 8-cores each for total of 16 cores and 2 TB of RAM
6 nodes equipped with two AMD EPYC CPUs with 8-cores each for a total of 16 cores and 1 TB of RAM
100Gb/s Infiniband connectivity
Two Lustre filesytems: /home and /work, where /home has 110 TBs capacity and /work has 1.1 PB of capacity
A cumulative total of 250TB of local scratch (approximately 1.5 TB of /scratch space on most compute/GPU nodes)
Multiple partitions (or queues) having different characteristics and constraints:
- bigmem2: 1 node
- gpu1a100: 2 nodes
- bigmem: 2 nodes
- compute1: 65 nodes
- compute2: 27 nodes
- compute3: 6 nodes
- gpu1v100: 22 nodes
- gpu2v100: 9 nodes
- gpu4v100: 2 nodes
- two privately owned partitions consisting of 24 nodes
- one privately owned partition equipped with three Nvidia DGX servers with 8x A100 80 GB GPUs
- one privately owned partition equipped with two Dell R760XA servers with 4x L40S GPUs
- two privately owned partitions each equipped with Dell XE8640 servers with 4x H100 GPUs

2. Arc is accessible over SSH using two-factor authentication with DUO. Hostname for Arc is arc.utsa.edu and the SSH port number is 22. In order to utilize DUO, you must register online at passphrase.utsa.edu .

3. Arc Fair-Use Policies

Running Jobs
- Compute nodes are not shared among multiple users. Instead, when a user grabs a compute node, they will be the only user allowed to access it. This is being implemented for security reasons, as well as performance reasons. If multiple users are sharing the same node, performance can be negatively impacted due to resource contention. While we will no longer be scheduling jobs from different users on the same node, users are encouraged to take advantage of tools such as GNU parallel to co-schedule their multiple independent tasks on the compute nodes allocated to them. Please see Section 10 of the user-guide for further details on running multiple tasks concurrently on one or more nodes from a single Slurm job.
- Each user will be limited to 10 active jobs at a given point in time and will be limited to running these jobs on a maximum of 20 compute nodes. As each compute node is dual-socket, and has a 20-core processors on each socket, a total of 800 cores could be potentially used by a job at a given point in time.
- Each job will be limited to a run-time of no more than 72 hours. Users are encouraged to consider implementing checkpointing-restart capabilities in their home-grown applications. The research computing support group will be happy to provide guidance on implementing checkpointing-restart mechanism in the users' code. Some third-party software, like the FLASH astrophysics code, already have in-built capabilities to checkpoint-restart. Such capabilities can be enabled by setting the required environment variables. The users are encouraged to review the documentation of their software to confirm whether or not the checkpoint-restart functionality is available in the software of their choice. Section 16 of this user-guide has further information on using checkpointing and restart.
- Exceptions : If you require access to nodes for a longer period of time, or need access to more nodes than what are allowed by default, please submit a service request ticket with an exemption request. We will need a brief description of the activity for your request, along with the number of cores and nodes required, and the time duration for which you are requesting the exemption. Also, we request to explore the options for checkpointing the code before submitting the ticket for service request at the following URL: https://support.utsa.edu/myportal
Data Storage (Disk Usage)
- Work Directory (/work/abc123) – as detailed in our Wiki, this directory is where you should place any input/output files as well as logs for your running jobs. This directory is NOT backed up and is not intended for long-term storage.
- Work Directory Data Retention – all files in the Work directory that have not been accessed in the last 30 days will be likely candidates for deletion.
- Home Directory (/home/abc123) – this directory is backed up but should only be used for installing and compiling code. Storage of datasets is permitted here, but there will be a hard quota limit of 100GB in place.
- Vault Directory (vault/research/abc123) - each user on Arc is provided 1TB of archival storage located in /vault. This storage space is accesible from Arc, as well as Windows or Mac computers. This data is backed up and the backups are replicated to UT Arlington for an extra layer of protection. If additional storage space is needed on the "vault" system, please submit a service request at the following URL: https://support.utsa.edu/myportal
GPU Resources Utilization
- Monitoring: We employ enhanced monitoring systems to check for GPU usage on all jobs allocated to general use GPU nodes.
- Termination: Jobs detected to be idle or not making use of the allocated GPUs will be terminated after a grace period of 1 hour.
- Notification: After termination, users will receive a warning email to either adjust their job accordingly or utilize one of the many CPU queues.

4. Requesting an Account on Arc

* If you are interested in requesting an account on Arc, please visit the support portal and search for "HPC account"

Please note that sharing of User Credentials is strictly prohibited. Any violation of this policy could lead to suspension of your account on Arc.

5. Prerequisite: Arc has a Linux operating system and hence, basic knowledge of Linux is required for working efficiently on Arc in command-line mode.

If you need help with learning Linux, the following link will provide a quick overview of Linux and basic Linux commands: Express Linux Tutorial

6. Logging into Arc, Submitting Jobs, and Monitoring Jobs on Arc

File transfer
Migrating Data from Arc to the Isilon archival storage
General Instructions for File Transfer

7. Modules for Managing User Environment on Arc

8. Running C, C++, Fortran, Python, and R applications in Serial Mode

Both batch and interactive modes of running serial applications is covered
Code and scripts used in the examples shown in the document are available from this GitHub repository

9. Running Parallel Programs

Code and scripts used in the examples shown in the document are available from this GitHub repository
OpenMP, MPI, and CUDA examples are covered in this document
C, C++ and Fortran are the base languages used

10. Running Multiple Copies of Executables Concurrently from the Same Job

Running multiple executables concurrently from the same job is covered
Using GNU Parallel for running parameter-sweep applications is covered

11. Accessing and Running Code on Vector Engines in Arc

12. Additional Python and R Usage Information

Python VMs in Anaconda
Saving and Importing Python Environments
Saving the Python Virtual Environment in Work Directory
Saving and Importing R Environments: Saving the R libraries in Work Directory
R troubleshooting - Fixing a corrupted library directory using Bioconductor

13. Using Some of the Popular Software Packages that are Installed System-Wide

14. Using Containers (Singularity and Docker) on Arc

15. Open On Demand Virtual Desktop

16. Visualization Using Paraview on Arc

17. Setting Java Environment for Applications with Java Dependencies

18. Application Checkpointing and Restart on Arc

19. Checking Currently Installed Software on Arc

To check the list of the currently available software packages on Arc, please use the "module spider" or "module avail" command from a compute node
Details on using the module commands for managing the shell environment on Arc are available here
By default, a module named XALT [1] is loaded into everyone's shell environment. XALT is a tool that allows the Arc HPC support staff to collect and understand job-level information about the libraries and executables that end-users access during their jobs. This assists us in tracking user executables and library usage on the cluster. If you experience an issue that may involve XALT, the module can be removed using the module unload command.
The list of software packages that are available on Arc as of August 23, 2021 can be found here

20. Technical Support

For technical support, you can submit a support request for Arc at the following link: https://support.utsa.edu/myportal. Instructions for submitting support requests can be found here.
The Research Computing Support Group is available between 8:00 AM to 5:00 PM on all business days to assist with the service requests.
Our time-to-response on new tickets is 4 business hours, and the time-to-resolution varies depending upon the complexity of the issue.
- Please open a new ticket for every new topic
- Once a ticket is closed you are welcome to reopen it if the exact topic that was addressed in the ticket appears to be still unresolved
For after-hours emergency support, please contact Tech Cafe at 210-458-5555.

21. Training and Workshops

References

"User Environment Tracking and Problem Detection with XALT," K. Agrawal, M. R. Fahey, R. McLay, and D. James, In Proceedings of the First International Workshop on HPC User Support Tools, HUST '14, Nov. 2014. dx.doi.org/10.1109/HUST.2014.6.

I	Attachment	Action	Size	Date	Who	Comment
pdf	Deep Learning Model on CIFAR10 dataset using PyTorch on GPU nodes.pdf	manage	417 K	15 Aug 2021 - 19:36	AdminUser	Pytorch on GPUs
pdf	Express_Linux_Tutorial-SizeOptimized.pdf	manage	653 K	19 Aug 2021 - 19:22	AdminUser	Quick Linux Tutorial - Saved as a "Reduced Size" pdf to get below 10MB size limit
pdf	Installation and Working of Deep Learning Libraries (TensorFlow) on Remote Linux Systems (Stampede2 and Arc).pdf	manage	135 K	15 Aug 2021 - 18:45	AdminUser	Tensorflow
pdf	RUNNING MATLAB “Hello, World” Example on Remote Linux Systems (1).pdf	manage	104 K	15 Aug 2021 - 18:07	AdminUser	Sample MatLab Job
pdf	Running_Jobs_On_Arc.pdf	manage	392 K	25 Oct 2022 - 22:29	AdminUser	Running Jobs on Arc
EXT	migrate-shamu2arc	manage	3 K	26 Aug 2021 - 14:23	AdminUser	Bash wrapper script for rsync to migrate user home and/or work data from Shamu to Arc
pdf	running_c_cpp_fortran_python_r.pdf	manage	412 K	19 Aug 2021 - 21:32	AdminUser	Running C, C++, fortran, Python, and R applications in serial mode
pdf	running_executables_and_gnu_parallel.pdf	manage	468 K	19 Aug 2021 - 21:34	AdminUser	Executables and GNU Parallel
pdf	running_parallel_programs_on_Arc.pdf	manage	476 K	19 Aug 2021 - 21:32	AdminUser

Topic revision: r75 - 03 Apr 2025, AdminUser

WebDocumentation

Webs
ARC
CondaEnvironmentSaysMetadataCorruptedWhenInstalling
Main
Sandbox
System
WebDocumentation

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback