$ Revised: Tue Nov 24 2015 by firstname.lastname@example.org
This is an introductory graduate course covering several aspects of
parallel and high-performance computing.
Upon completion, you should
- be able to design and analyze parallel algorithms for a variety of
problems and computational models,
- be familiar with the hardware and software organization of
high-performance parallel computing systems, and
- have experience with the implementation of parallel applications
on high-performance computing systems, and be able to measure,
tune, and report on their performance.
Additional information including the course syllabus can be found in the
All parallel programming models discussed in this class are supported on
or phaedra which are available for use in this class.
The midterm will be given in-class on Tue Oct 13.
- The scope of the exam is material in lectures 1-12.
- Exam papers will handed out as soon as the previous class leaves. You may open
your papers when instructed (12:25 PM). The papers will be collected no later
than 1:50 PM (many of you have an exam in the next class period).
- You may may consult all course notes and other course materials during the exam.
If you use an electronic device to access these materials, the device cannot be
used to access other materials or to communicate in any fashion. You may use
- Some practice questions are available here
- This course will use Piazza to manage class questions and discussions online.
(some material local-access only)
- (for Tue Nov 17) Read Kumar et al.,
Basic Communication Operations.
- (for Tue Nov 10) Skim
MPI tutorial by Blaise Barney, LLNL.
- (for Thu Nov 5) Skim the
Questions and Answers about BSP
pp 1-25. We will not use BSPLib directly, rather we use the BSP model together
with communication operations from the MPI library.
- (for Thu Oct 8) Read Nyland et. al,
Fast N-Body Simulation with CUDA.
Check supplementary materials for Cuda in the Software section below.
- (for Thu Oct 1) Read
Hennessy & Patterson Ch. 8, sections 8.5 - 8.6
(synchronization primitives in shared memory, and
memory consistency models).
- (for Thu Sep 24) Read
Memory consistency models tutorial (sections 1-6, pp 1 -17).
- (For Thu Sep 17)
The Implementation of the Cilk-5 Multithreaded Language,
sections 1 - 3.
- (For Tue Sep 15) Look through
Open MP Tutorial sections 7-9. The gcc compiler on
bass.cs.unc.edu is rel 4.4.7 and supports OpenMP 3.0,
while this material corresponds to version 3.1 which
differs only in very minor ways, not ones you are likely
- (For Thu Sep 10) Look through
Open MP Tutorial sections 1-6. Most examples are
shown in C/C++ and in Fortran. We will be using C/C++.
Ignore WORKSHARE and TASK directives, and discussion of
nested parallel constructs.
- (for Tue Sep 8) Read the overview of
Memory Hierarchy in Cache-based Systems.
- (For Thu Sep 3) PRAM Handout, (review section 4.1), section 5.
- (For Thu Aug 27) PRAM Handout, sections 3.4, 3.6
- (For Tue Aug 25) PRAM handout, sections 3.2, 3.3, 3.5
- (For Thu Aug 20) Read PRAM Handout secns 1, 2, 3.1 (pp 1 - 8)
- (For Thu Aug 20) Look over the course overview.
Written and Programming Assignments
- (Aug 27) Written assignment WA1 is available. Due date is Sep 15.
Sample Solutions available.
- (Sep 10) Programming assignment PA1a is available. Due date is Tue Sep 22.
Paper submission instructions:
Turn in one document per team that clearly identifies both team members, and includes the performance graph
and discussion of the results. Also be sure to identify the login of the team member who uploaded the code
for this assignment.
Code submission instructions:
One team member should copy the files needed to build your code from the bass login node
where <yourlogin> should be replaced with your CS login.
Alternatively, you can upload files or directories from anywhere using the command
scp -pr <localfile-or-dir> <yourlogin>@classroom.cs.unc.edu:/afs/unc/project/courses/comp633/Submit/<yourlogin>/pa1a
In this case you have to supply your cs password to a prompt.
- (Oct 6)
Programming assignment PA1b is available.
Due date has been changed to Mon Oct 26.
Submission instructions are as for pa1a: put your graphs on paper to hand in
to me (or slide under my door) and upload the code to submission directory
pa1b (for one team member). Your paper submission should identify your
team and the login of the person who uploaded the code to the submission
- (Oct 1) Written Assignment WA2 is available.
Due date is Thu Oct 8 at the start of class.
Sample solutions are available for WA2.
- (Nov 3)
Programming Assignment PA2 is available.
Problem selection by Nov. 12, due date is Wed Dec. 2, submissions acccepted through
Monday Dec. 7.
- (Nov 12)
Written Assignment WA3 is available. Due date is Tue Dec 1 at start of class.
We will be using the Bass system for
programming assignments. The Bass system supports all the programming models studied in
this class except those used with Intel Xeon Phi accelerators and Cilk --
a separate compute server phaedra provides access to these capabilities.
The general instructions for
getting started on bass
are supplemented below with specific instructions for each programming model.
When you login to bass.cs.unc.edu you are connected to a specific node on bass dedicated
to interactive program development. You can compile programs on this node.
Shared-memory programs run within an individual node on bass. Distributed-memory programs run
across multiple nodes in Bass. The login node should not be used to run your programs,
although a short debug test for a few seconds and no more than 4 cores should be OK.
In general programs that need multiple nodes or dedicated nodes or GPUs should be
submitted to queues that are managed by the Grid Engine job scheduler.
- OpenMP reference: Specification of
OpenMP 3.0 API for C/C++.
You may be more interested in
OpenMP support in gcc 4.4.7 (the compiler on Bass).
- Bass-specific material
Getting started on Bass.
- To get accurate performance information run your programs
on a dedicated node as a batch job with a shell script myjob using
qsub -pe smp 16 myjob
or interactively via
qlogin -pe smp 16
Do not park yourself on this node as everyone else in the class will be held up.
- A directory with the sample diffusion program
discussed in class.
- Command lines for the compilation and execution of programs on
- C compilation to create a sequential program (compiler ignores OpenMP
directives and does not link with the OpenMP runtime library):
gcc -O3 -o prog prog.c (Gnu C compiler 4.4.7)
- C compilation to create a parallel program (OpenMP 3.0 directives honored
and program linked with the OpenMP runtime library)
gcc -fopenmp -O3 -o prog prog.c (Gnu C compiler 4.4.7)
- Cuda 4.2 on Bass
- Cuda 5.0 on killdevil and cuda 5.5 on other GPU-equipped machines
- MPI reference material
- Running MPI programs on Bass
Phaedra is a 20-core Xeon E5-2650 server with eight attached Intel Xeon Phi 5110P accelerators. The
server hosts Intel compilers and performance analysis tools to access the accelerators as well
as shared memory parallel programming models for the server cores.
If you wish to use this machine, you need to contact me first so access can be set up.
This list will evolve throughout the semester. Specific reading
assignments are listed above.
- PRAM Algorithms, S. Chatterjee, J. Prins,
COMP 633 course notes, 2013.
Memory Hierarchy in Cache-Based Systems,
R. v.d. Pas, Sun Microsystems, 2003.
OpenMP tutorial, Blaise Barney, LLNL, 2013.
The Implementation of the Cilk-5 Multithreaded Language,
M. Frigo, C. Leiserson, K. Randall, in
Proceedings of ACM Conf. on Programming Language Design and
- Shared Memory Consistency Models: A Tutorial,
S. V. Adve, K. Gharachorloo, DEC Western Research Labs Report 95/7, 1995.
- Computer Architecture: A Quantitative Approach 2nd ed,
D. Patterson, J. Hennessy, Morgan-Kaufmann 1996.
Fast N-Body Simulation with CUDA, L. Nyland, M. Harris, J. Prins,
GPUGems 3, 2008.
An Overview of Programming for Intel Xeon processors an Intel Xeon Phi coprocessors,
James Reinders, Intel Corp, 2012.
- "Questions and Answers about BSP", D. Skillicorn, J. Hill,
and W. McColl, Scientific Programming 6, 1997.
- Message Passing Interface,
Blaise Barney, LLNL 2015
- Introduction to Parallel Computing: Design and Analysis of
V. Kumar, A. Grama, A. Gupta, G. Karypis, Benjamin-Cummings, 1994.
This page is maintained by
Send mail if you find problems.