$ Revised: Tue Dec 13 2011 by email@example.com
This is an introductory graduate course covering several aspects of
high-performance computing, primarily focused on parallel computing.
Upon completion, you should
- be able to design and analyze parallel algorithms for a variety of
problems and computational models,
- be familiar with the hardware and software organization of
high-performance parallel computing systems, and
- have experience with the implementation of parallel applications
on high-performance computing systems, and be able to measure,
tune, and report on their performance.
Additional information including the course syllabus can be found in the
All parallel programming models discussed in this class are supported on
which will be available for use in this class.
- The final exam will be 24-hour take home and is available in my office
at noon on Thu Dec. 15 (or earlier, by arrangement).
- Tue Dec 1 guest lecturer Lars Nyland will talk about the
Cuda architecture and recent
performance analysis and debugging tools.
- Thanks to Prof. Manocha and the Gamma group, there are a couple
of workstations available for CUDA debugging
Gamma17 is dual boot (Linux/Windows)
Gamma25 is windows only
Each machine has an NVIDIA GTX480 GPU. These machines are in
Brooks Glab and are located next to each other in the southwest
corner of the Gamma area. Gamma group members have priority on
these machines so please keep that in mind.
- Midterm exam is scheduled for Tue Oct 25. You may use all notes, readings, and handouts
during the exam, but no computers.
- The final exam for this class, according to the registrar, is scheduled for
noon, Thursday Dec. 15.
- (Oct 13) node bass-comp15 is no longer dedicated for class use, please use
qlogin or batch submission to run jobs on bass.
- (Oct 3) node bass-comp15 is dedicated for class use with the current assignment
(to be accessed as described in class).
- (Sep 6) Everyone should have a login on bass.cs.unc.edu at this point.
(some material local-access only)
- (for Tue Nov 29) Read Kumar et al.,
Basic Communication Operations.
- (for Thu Nov 10) Read Foster, Chapter 8,
Overview of MPI.
- (for Thu Nov 3) Read Skillicorn et al.,
Questions and Answers about BSP.
- (for Tue Oct 18) Read Nyland et al.
Fast N-Body Simulation with CUDA
- (for Tue Oct 11) Read
Hennessy & Patterson Ch. 8, sections 8.5 - 8.6
(synchronization primitives in shared memory, and
memory consistency models).
- (for Thu Oct 6) Read
Memory consistency models tutorial.
- (For Thu Sep 29)
The Implementation of the Cilk-5 Multithreaded Language,
sections 1 - 3.
- (For Thu Sep 22)
Open MP Tutorial secns 4.8 - 7.
- (For Thu Sep 15) Look through
Open MP Tutorial up through worksharing DO/for (secns 1-4.6).
- (For Tue Sep 13) Read the overview of
Memory Hierarchy in Cache-based Systems.
- (For Tue Sep 6) PRAM Handout, (review section 4.1), section 5.
- (For Thu Sep 1) PRAM Handout, sections 3.4, 3.6.
- (For Tue Aug 30) PRAM handout, sections 3.2, 3.3, 3.5.
- (For Thu Aug 25) Read
PRAM Handout secns 1, 2, 3.1. (pp 1 - 8)
Written and Programming Assignments
- (Nov 15) Written assignment WA3 handed out in class due Tue Nov 29
(extension: now due no later than start of class on Dec. 1)
- (Oct 27) Programming assignment PA2 handed out in class. Project selection due Nov 8,
Final submission due Dec. 6.
- (Oct 6) Written assignment WA2 handed out in class due Thu Oct 13.
- (Sep 15) Programming assignment PA1(b) handed out in class due Thu Oct 4.
- (Sep 1) Programming assignment PA1(a) handed out in class due Thu Sep 15.
- (Aug 25) Written assignment WA1 handed out in class. Extended due date Tue
We will be using the Bass system for
programming assignments. The Bass system supports all the programming models studied in
The general instructions for
getting started on bass
are supplemented below with specific instructions for each programming model.
When you login to bass.cs.unc.edu you are connected to the front end. You can compile
programs there. Shared-memory programs run within an individual node on bass. Don't run your
shared-memory programs on the front end for more than a few seconds or with more than 4
processors! Distributed-memory programs or programs that use GPUs must be submitted through
queues that are managed by the Sun Grid Engine.
- OpenMP reference: Specification of
OpenMP binding for C/C++.
- Bass-specific material
Getting started on Bass.
- To get accurate performance information run your programs
on a dedicated node as a batch job with a shell script myjob using
qsub -pe smp 16 myjob
or interactively via
qlogin -pe smp 16
Do not park yourself on this node as everyone else in the class will be held up.
- A directory with the sample diffusion program
discussed in class.
- Command lines for the compilation and execution of programs on
bass.unc.edu. To access the SunStudio Ceres compiler (5.10), you
need to make sure that /opt/sunstudioceres/bin is on your path before
/usr/bin (else you will get the gcc compiler).
- C compilation to create a sequential program (compiler ignores OpenMP
directives and does not link with the OpenMP runtime library):
cc -fast -o prog prog.c (SunStudio Ceres compiler 5.10) or
gcc -O3 -o prog prog.c (Gnu C compiler 4.1.2)
- C compilation to create a parallel program (OpenMP directives honored
and program linked with the OpenMP runtime library)
cc -xopenmp=parallel -fast -o prog prog.c (SunStudio Ceres compiler 5.10) or
gcc -fopenmp -O3 -o prog prog.c (Gnu C compiler 4.1)
- The task parallel capabilities in OpenMP 3.0 require that you use
gcc44 (GCC 4.4) to compile your programs.
Cilk and Cilk++
- This Cilk reference manual
refers to a slightly older revision of the Cilk system, but
is accurate with respect to the language definition.
- Cilk runs on Bass, but there is no public installation.
We recommend you install and use Cilk++ instead of Cilk.
- Cilk++ can be downloaded from
Intel. For Bass select the 64-bit linux version. You can install it
in your home directory. You can run it via qlogin or batch submission just
like OpenMP programs.
- CUDA 2.3 Programming Guide (Aug 2009)
- The Cuda 4.0 SDK is installed on bass, including the Cuda compiler nvcc.
- Compile your code on the login node. See the instructions on the
Bass web page to set the environment correctly.
- Get a login on a GPU node: qlogin -l gpu_host,gpus=1 and run your program.
- Java threads reference material
- Java threads execute in parallel on any Bass node.
- MPI reference material
- Running MPI programs on Bass
This list will evolve throughout the semester. Specific reading
assignments are listed above.
- PRAM Algorithms, S. Chatterjee, J. Prins,
course notes, 2009.
Memory Hierarchy in Cache-Based Systems,
R. v.d. Pas, Sun Microsystems, 2003.
OpenMP tutorial, Blaise Barney
- Multithreaded, Parallel and Distributed Programming,
G. Andrews, Addison-Wesley, 2000.
- Computer Architecture: A Quantitative Approach 2nd ed,
D. Patterson, J. Hennessy, Morgan-Kaufmann 1996.
Fast N-Body Simulation with CUDA, L. Nyland, M. Harris, J. Prins,
in GPU Gems 3, H Nguyen, ed., Prentice-Hall 2007.
- "Questions and Answers about BSP", D. Skillicorn, J. Hill,
and W. McColl, Scientific Programming 6, 1997.
- Designing and Building Parallel Programs, I. Foster,
- Introduction to Parallel Computing: Design and Analysis of
V. Kumar, A. Grama, A. Gupta, G. Karypis, Benjamin-Cummings, 1994.
This page is maintained by
Send mail if you find problems.