Gravity

From SciNetWiki
Jump to: navigation, search
Gravity
Ibm idataplex dx360 m4.jpg
Installed December 2012
Operating System Linux Centos 6.4
Number of Nodes 49 (588 cpu cores, 50176 gpu cores)
Interconnect QDR Infiniband
Ram/Node 32 Gb
Cores/Node 12 with 2x M2090 GPUs
Login/Devel Node gravity01 (from login.scinet)
Vendor Compilers nvcc,pgcc,icc,gcc
Queue Submission Torque

The Gravity cluster will be decommissioned by the end of 2017. A new system for large parallel jobs, Niagara, will be replacing the GPC and TCS, and is expected to be in production in early 2018. Contributed systems like Sandy and Gravity will also reach their end-of-life in this transition. The aim is to keep at least 50% of the GPC available during the installation of the new system. Users will be informed about further details of the transition as they become available.

The Gravity cluster, consists of 49 x86_64 nodes each with two hex core Intel Xeon (Sandybridge) E5-2620 2.0GHz CPUs with 32GB of RAM per node. Each node has two NVIDIA Tesla M2090 GPUs with CUDA Capability 2.0 (Fermi) each with 512 CUDA Cores and 6 GB of RAM. The nodes are interconnected with 3:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet GPFS filesystems. In total this cluster contains 588 x86_64 cores with 1,568 GB of system RAM and 98 GPUs with 588 GB GPU RAM total.

NB - gravity is a user-contributed system acquired through a CFI LOF to a specific PI. Policies regarding use by other groups are under development and subject to change at any time.

Note that SciNet has a mailing lists for people interested in GPGPU computing. To receive information on courses, workshop and other GPGPU related events, sign up at https://support.scinet.utoronto.ca/mailman/listinfo/scinet-gpgpu.

Contents

Nodes

Login

First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can proceed to gravity01 which is the GPU development node.

Devel

As mentioned gravity01 is the head/develop node for interactive use. This node is for compiling, short testing, and submitting batch jobs to the compute nodes. It is a shared resource so treat it accordingly and use the queue and compute nodes for long are large computations.

ARC Experimental (ARCX) Xeon Phi/ Tesla K20

A separate devel node, arcX, with a single Intel Xeon Phi and a NVIDIA Tesla K20 is also available for testing these newer technologies. For full details see the Xeon Phi / Tesla K20 wiki page.

Compute

To access the other 48 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes. Currently the nodes are scheduled by complete node, 12 cores and 2 GPUs, and a maximum walltime of 12 hours.

For an interactive job use

qsub -l nodes=1:ppn=12:gpus=2,walltime=12:00:00 -q gravity -I

or for a batch job use

qsub script.sh 

where script.sh is

#!/bin/bash
# Torque submission script for Gravity
#
#PBS -l nodes=2:ppn=12:gpus=2,walltime=1:00:00
#PBS -N GPUtest
#PBS -q gravity
cd $PBS_O_WORKDIR
 
# EXECUTION COMMAND; -np = nodes*ppn
mpirun -np 24 ./a.out

To check running jobs on the gpu nodes only use

showq -w class=gravity

Important note:

A bug in the torque scheduler currently sets the environment variable CUDA_VISIBLE_DEVICES to an incorrect value. Loading any one of the cuda modules will correct this, so be to do this in your job script or in your interactive jobs.

Software

The same software installed on the GPC is available on Gravity using the same modules framework. See here for full details.

Programming Frameworks

Currently there are four programming frameworks to use: NVIDIA's CUDA framework, PGI's CUDA Fortran, PGI's implementation of OpenACC, or OpenCL.

NVIDIA toolkit

Driver Version

The current NVIDIA driver version for gravity is 331.67.

CUDA

The current installed CUDA Toolkits for gravity are 3.2, 4.0, 4.1 (default), 4.2, 5.0, 5.5, and 6.0. A cuda/6.5 module is installed as well, but only for the K20 GPU on the arcX node; that version will not work on the gravity nodes. To use CUDA version 6.0 (recommended for the gravity nodes), just use the following module command

module load gcc/4.8.1 cuda/6.0

(gcc is a prerequisite of the cuda module; using earlier versions of gcc likely will also work.)


The CUDA driver is installed locally, however the CUDA Toolkits are installed in:

/scinet/arc/cuda-$VERSION/

The environment variable $SCINET_CUDA_INSTALL is set when a cuda module is loaded and it points to the install location. This is useful when setting up makefiles and if you use the NVIDIA_SDK build evironment, modify the NVIDIA_SDK/C/common/common.mk file accordingly.

CUDA_INSTALL_PATH = $SCINET_CUDA_INSTALL 

The Nvidia cuda compiler (which uses gcc/4.4.6 by default for CUDA < 4.1, while cuda/4.2 uses gcc/4.6.1), is called nvcc,

You'll have to let the cuda compiler know about the capabilities of the Fermi graphics card by supplying the flag

-arch=sm_13
or
-arch=sm_20

NVIDIA Toolkit

For cuda 5.0 and 5.5, the sdk code samples can be copied from the directory

$SCINET_CUDA_INSTALL/samples/

NOTE: Not all of the CUDA and OpenCL examples will compile as many require OpenGL graphic libraries not installed on the nodes.

OpenCL

As of 3.0, OpenCL 1.1 is included in the CUDA Toolkit so loading the CUDA module is all that is required.

PGI compilers

As of July 2012, The PGI suite of compilers is installed. These can be accessed by

$  module load gcc/4.6.1 pgi/12.6

(if you use the older pgi/12.5, gcc/4.4.6 is a requirement, and is used, for instance, in the CUDA parts of the PGI compilers). These compilers use their own cuda installation, so you do not need to load an additional cuda module. By default, they use a cuda 4.1 installation, but you can request cuda 4.2 as well using the -Mcuda=4.2 option.

The compilation commands are pgcc, pgcpp and pgfortran for c, c++ and fortran, respectively. As usual, we advice to compiler with optimization using the flags

-O4 -fastsse

The compilers will then optimize for the specific machine that you are compiling on.

The PGI compilers support OpenMP as well through the compile and link flags

-mp

CUDA Fortran

The PGI fortran compiler (pgfortran, also pgf77 and pgf90) understands CUDA extensions to fortran. This compiler will automatically understand these extension for source files with the file extension .cuf Otherwise, you have to specify

-Mcuda=4.1

OpenACC

OpenACC is a compiler-directive approach to GPGPU programming. The PGI compilers (c, c++ and fortran) have a partial implementation of this open specification. To switch this on, use the options

-acc -ta=nvidia -Mcuda=4.1

More documentation

Manuals are on the Tutorials and Manuals page.

Other compilers

  • gcc,g++,gfortran - GNU compiler (nvcc need to have either gcc-4.4 or gcc-4.6 module loaded to work correctly)
  • icc,icpc,ifort - Intel compiler

Debuggers

  • ddt - Allinea's graphical DDT debugger, in the ddt module. The most recent version, ddt 4.0 supports cuda 4.0, 4.1, 4.2 and 5.0.
  • cuda-gdb - Nvidia text based gdb variant, part of the cuda module.


Note that to debug both host and cuda device code, you have to give the
-g -G
pair of flags to nvcc.

MPI

The GPC MPI packages can be used on this system. See the GPC section on MPI for more details.

While these mpi packages should work with the PGI compilers as well, this has not been tested and standard wrappers like mpif90 may not work.

Alternatively, for mpi compilations with the PGI compilers, you can load the mpich1 mpi implementation with
module load mpich1/pgi
after which you can use the option
-Mmpi
or the wrapper scripts mpicc, mpiCC and mpif90, as well as mpirun.

Driver Version

The current NVIDIA driver version installed is 331.67

Documentation

  • OpenCL
    • see above

User Codes

Please discuss and put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL

Personal tools
Namespaces
Variants
Actions
Systems
Knowledge Base
Wiki
Toolbox