Managing your Files
Users on Arc are automatically granted several locations to store their files. Most users will be storing their files in one of two locations, /home
- $HOME directory - /home/abc123 - This directory is entirely controlled by you and the default permissions are that nobody else can see or access your files.
- Work directory - /work/abc123 - This directory is where you should place any input/output files as well as logs for your running jobs. This directory is NOT backed up.
Below are instructions for transferring file to Arc from Windows, Mac, and Linux systems.
On Microsoft Windows, an SFTP client must be downloaded to transfer files to Arc. This guide will use the “MobaXterm” application, which was also used in the connecting (ssh) and X-forwarding guides. Other common SFTP applications are listed below; all of these will work fine with Arc.
File Transfer using MobaXterm
When you log in to a remote Arc session using SSH, a graphical SFTP (Secure File Transfer Protocol) browser appears in the left sidebar allowing you to drag and drop files directly to or from Arc using the SFTP connection. To manually open a new SFTP session:
- Open a new session
Windows-based editors generally put an extra “carriage return” (also referred to as control-M/^M) character at the end of each line of text. This will cause problems for most Linux-based applications. To correct this problem, execute the built-in utility dos2unix on each ASCII file you upload from your Windows machine to Arc:
[abc123@login01 ~]$ dos2unix filename
There are several ways to transfer files from a Linux or Mac system to Arc. You can use either scp or rsync from the command-line.
Open a Terminal and use the scp
command as shown in the example below - assuming your Arc username is abc123:
localMachine% scp LocalFile abc123@login.Arc.utsa.edu:/path/to/destination
Below is a specific example:
localMachine% scp hello_world.py abc123@login.Arc.utsa.edu:/work/abc123/.
scp works like cp but uses ssh to connect, so you will be asked for your campus passphrase again.
To copy a directory from a local to remote system use the -r option:
localMachine% scp -r /local/directory abc123@login.Arc.utsa.edu:/remote/directory
The scp command can be used for transferring files between any two remote systems as well. As an example, a file named "filetest" from your /work/abc123 directory on Shamu can be transferred to your similar work directory on Arc using the command that looks as follows:
[abc123@login01 abc123]$ scp ./filetest firstname.lastname@example.org:/work/abc123/.
Warning: Permanently added 'arc.utsa.edu,18.104.22.168' (ECDSA) to the list of known hosts.
filetest 100% 9 3.9KB/s 00:00
You can use wildcards in the scp command as shown below:
[abc123@login01 abc123]$ scp *.txt email@example.com:/work/abc123/.
This will copy all files that end in ".txt" to the ARC environment.
When copying a directory with multiple files, use tar to create a compressed archive of the directory, then transfer the directory as a single file:
[abc123@login01 abc123]$ tar -czvf ./mydata.tar mydata # create archive
[abc123@login01 abc123]$ scp ./mydata.tar firstname.lastname@example.org:/work/abc123/. # transfer archive
See the man page for scp for additional information and details on the command.
Transfer Using RSYNC
The rsync utility helps in synchronizing files maintianed on source and destination systems. Unlike scp, rsync copies only the changed portions of individual files. Therefore, it is efficient to use rsync when you only need to update a small fraction of a large dataset at the destination location. The syntax of rsync is as follows:
[abc123@login01 abc123]$ rsync myfile email@example.com:/work/abc123/.
This command copies the file "myfile" to the destination directory on Arc.
To copy a directory and all subdirectories to a remote location, use this command:
[abc123@login01 abc123]$ rsync -avtr mydirectory firstname.lastname@example.org:/work/abc123/.
- a = Preserve symbolic links and other meta-data
- v = Verbose output
- r = Recursive coyp, ie: copy this directory and all subdirectories
- t = Preserve time stamps
The options on this rsync command are useful when synchronizing your data to a remote location. The first time you run this command, a copy of the file and directory structure is created at the remote location. After the initial copy, if some files then change on the source side and the command is run again, only those changes are copied to the remote location.
Also, if the rsync data transfer is interrupted for some reason, you can just re-run the command to copy over and sync those items that did not get previously copied or updated.
See the man page for rsync for additional information and details on the command.