You are here: Foswiki>ARC Web>WebTopicList>UsingVectorEnginesOnArc (21 Sep 2022, AdminUser)Edit Attach

Using NEC SX-Aurora Vector Engines On Arc

This guide provides an overview of the NEC SX-Aurora TSUBASA Vector Engines (VE) nodes available on the Arc HPC Cluster.

ARC Vector Engine Components
Accessing Vector Engine Nodes
Vector Engine Coding
- Compiling Vector Engine Code
- Running Your Compiled Programs
Additional Resources & Documentation

ARC Vector Engine Components

Vector Engine Compiler Node (vc001)

The ARC HPC environment includes 1 x VE Compiler node. This node is licensed to enable users to compile code for the Vector Engine nodes. After compiling your VE code, you can run your compiled programs any of the 5 x Vector Engine nodes.

Cross Compilers:

nfort
ncc
nc++

Tools:

nld (Linker)
nar (Archiver)
nranlib (Index generator for archives)

MPI wrappers:

mpiinfort
mpincc
mpinc++

Vector Engine Compute Nodes (v001 - v005)

The Arc HPC environment includes 5 x AMD Compute Nodes each with a NEC Vector Engine Card, providing increased memory bandwidth and computational ability with increased power efficiency.

VE Compute Node specifications:

Server node Cores: 2 x Physical AMD CPUs, each with 8 cores and hyperthreading, providing a total of 32 Cores
Server node RAM: 1TB
Vector Engine Card RAM: 48GB
Vector Engine Card Cores: 8
Vector Engine Card Memory Bandwidth: 1.53TB/s

The Vector Engine nodes are named: v001 - v005

Vector Engine Slurm Partitions

The Arc HPC cluster contains over a dozen Slurm partitions, each representing a unique set of resources to help meet the scientific computing needs of our users. There are two partitions that have been setup specifically for use with the Vector Engine nodes:

amdvcompiler - Partition containing node vc001 that is used to compile VE code

Access to this partition is limited to 2 hours. Since this is the only node available for performing compilations, the time limit is shortened to allow for fair use and access to other users.

amdvector - Partition containing nodes v001 - v005 that are used to run VE code

The jobs that run in this partition have a 3 day time limit and can be run on one of the VE nodes.

Accessing Vector Engine Nodes

The AMD Vector nodes are accessed like any other node in the cluster.

Interactive shell with SRUN

You can use the following commands to get an interactive shell on one of the nodes.

This command will provide a bash shell on one of the Vector Engine nodes:

[login001: abc123]$ srun -p amdvector -n 1 -t 01:30:00 --pty bash

This command will provide a bash shell on the Vector Compiler node:

[login001: abc123]$ srun -p amdvcompiler -n 1 -t 01:30:00 --pty bash

Submitting Batch jobs with SBATCH

You can also submit a batch job:

[login001: abc123]$ sbatch my_ve_jobscript.slurm

Loading the Vector Linux Module

After accessing a node in an interactive shell, you can load the NEC Vector module to update your PATH and other environment variables. This enables you to invoke the compiler and various other tools without having to specify the full path. This command can also be included in your SBATCH script to ensure the proper environment variables are available for your job.

When the vector module is loaded, two shell scripts are sourced as part of the process: nlcvars.sh & necmpivars.sh Those scripts provide additional environment variable definitions and can be called with additional parameters, if needed. The module sources the scripts without any parameters. A brief explanation of the additional parameters is included in the output when loading the vector module.

[vc001: abc123]$ module load vector/2.8-1
The vector  module version 2.8-1  is loaded.

--- Sourcing: /opt/nec/ve/nlc/2.3.0/bin/nlcvars.sh
Additional Notes:
/opt/nec/ve/nlc/2.3.0/bin/nlcvars.sh can be called with alternate parameters:

Usage: source nlcvars.sh [ARGUMENT]...

  ARGUMENT:
  i64  specify the default integer type is 64-bit
       (default: 32-bit)
  mpi  specify MPI is used
       (default: no use of MPI)

--- Sourcing: /opt/nec/ve/mpi/2.21.0/bin64/necmpivars.sh
Additional Notes:
/opt/nec/ve/mpi/2.21.0/bin64/necmpivars.sh can be called with alternate parameters:

necmpivars.sh can take additional parameters, however the
"necmpivars.sh [gnu|intel] [version]" format should only be
used at runtime in order to use VH MPI shared libraries other
than those specified by RUNPATH embedded in a MPI program
executable by the MPI compile command.  In other cases,
"source /opt/nec/ve/mpi/2.21.0/bin/necmpivars.sh"
should be used without arguments.

The "version" parameter is a directory name in the following directory:
  /opt/nec/ve/mpi/2.21.0/lib64/vh/gnu (if gnu is specified)
  /opt/nec/ve/mpi/2.21.0/lib64/vh/intel (if intel is specified)
[vc001: abc123]$

Vector Engine Coding

Compiling Vector Engine Code

Due to licensing restrictions, Vector Engine code can only be compiled on the Vector Engine Compiler node, vc001.

There are several compilers available on vc001, including ncc for C, nfort for Fortran, and nc++ for C++. For each compiler, there are numerous options and suboptions available. A brief explanation of the compiler options are provided by invoking the command with the --help option.

[login001: abc123]$ srun  --pty -t 02:00:00 -n 1 -p amdvcompiler bash
[vc001: abc123]$ module load vector
...
[vc001: abc123]$ ncc --help

For example, this matrix multiplication C program, mmultest.c, is compiled with several options for vector optimization:</pre>

[vc001: abc123]$ ncc mmultest.c -O4 -report-all -fdiag-vector=2 -o mmultest
ncc: opt(1592): mmultest.c, line 11: Outer loop unrolled inside inner loop.: j
ncc: vec( 101): mmultest.c, line 14: Vectorized loop.
ncc: vec( 126): mmultest.c, line 15: Idiom detected.: Sum
ncc: vec( 128): mmultest.c, line 15: Fused multiply-add operation applied.
ncc: opt(1592): mmultest.c, line 24: Outer loop unrolled inside inner loop.: i
ncc: vec( 101): mmultest.c, line 26: Vectorized loop.
[vc001: abc123]$

Where:

-O indicates optimization level: 0=Disabled, 4=Aggressive Optimization
-fdiag-vector specifies vector diagnostics level by n. (0: No output, 1:Information, 2:Detail) (default: -fdiag-vector=1)
-report-all outputs the code generation list, diagnostic list, format list, inline list, option list and vector list.

Running Your Compiled Programs

Your compiled programs can be run on any of the Vector Engine nodes: v001 - v005.

Continuing with the matrix multiplication C program example, mmultest.c, from the previous section, you can use SRUN to access the amdvector Slurm partition and then run your code.

[login001: abc123]$ srun  --pty -t 02:00:00 -n 1 -p amdvector bash
[v001: abc123]$ module load vector
...
[v001: abc123]$ ./mmultest 3000 2000 5000
The elapsed time to multiply a [3000 x 2000] matrix with a [2000 x 5000] matrix is 2.41 seconds.

If you had compiled your code with the -O0 option, where the vector optimization level is disabled, it takes much longer to run the same matrix multiplication:

[v001: abc123]$ ./mmultest 1000 2000 1000
The elapsed time to multiply a [3000 x 2000] matrix with a [2000 x 5000] matrix is 1266.63 seconds.

Additional Resources & Documentation

Getting Started: Aurora Vectorization Training

This site contains training slides and exercises that can be used for self study: SX-Aurora TSUBASA Vectorization Training

Getting Started: SX-Aurora TSUBASA Performance Tuning Guide

The SX-Aurora TSUBASA Performance Tuning Guide is a recommended resource for providing a comprehensive background and foundation for utilizing the Vector Engines.

An archived copy of the document is available here: AuroraVE_TuningGuide.pdf

An online version is also available here: AuroraVE_TuningGuide

Other Resources

Additional resources can be found at this link: SX-Aurora Documentation Library

-- AdminUser - 21 Sep 2022

I	Attachment	Action	Size	Date	Who	Comment
pdf	AuroraVE_TuningGuide.pdf	manage	1 MB	21 Sep 2022 - 15:09	AdminUser
c	mmultest.c	manage	1 K	21 Sep 2022 - 14:44	AdminUser

Topic revision: r9 - 21 Sep 2022, AdminUser

ARC

Webs
ARC
CondaEnvironmentSaysMetadataCorruptedWhenInstalling
Main
Sandbox
System

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback