CSCI 5451, Introduction to Parallel Programming
Fall 2025, Mondays and Wednesdays, 8:15am to 9:30am, Lind Hall 302
Course Information
This course is an introduction to parallel computing. It covers parallel architectures, parallel algorithms and their analysis. It will also introduce you to programming on parallel platforms. The main programming medium used for the labs will be in C. OpenMP, MPI, CUDA for NVIDIA Graphics Processing Units (GPUs), and (optionally) NCCL will also be covered. The course begins with a more theoretical focus [the complexity of parallel algorithms and their efficiency], then gradually shifts to a more practical focus on particular algorithms [sorting, graph algorithms, matrix algorithms] and their use in CUDA architectures.
Most course information (lectures, homework descriptions + due dates, code stubs presented during lecture, office hours, etc.) will be posted to the course website. It is your responsibility to track this website and attend lectures to see updates regarding homework assignments and their due dates+requirements. We will use canvas only for homework submission and grading.
- Instructors
-
James Mooney
Instructor
Wenjie Zhang
Graduate TA - Class meets
- Mondays and Wednesdays, 8:15am to 9:30am, Lind Hall 302
- Office hours
- James: Tuesday, 3:30pm to 4:30pm, Shepherd 439
- Wenjie: Thursday, 4:00pm to 5:00pm, Keller Hall 6-210
- Class page
- https://jimtmooney.github.io/Courses/F25/index.html
- Canvas
- canvas.umn.edu/courses/518528
Grading, Evaluation & Late Policy
Grading
Evaluation
Your evaluation for this course will be based on 5 homeworks (15% each) and 1 final course project (25%).
Homeworks will be released every ~2 weeks beginning in the second week of September. Each release will consist of a pdf writeup of the assignment along with some small unit tests and a serial version of the program. Each homework submission will contain a zip file of the program itself along with a markdown file describing the solution. Homework submission will be due at 11:59 CST of the due date in Canvas.
Homework grading will consist of two portions. (1) An auto-graded portion determining that (a) the program compiles (b) the program passes a series of unit tests corresponding to the given problem and (c) the program achieves significant speedups over a serial version of the program (what constitutes significant will be stated in the homework pdf). (2) A human graded portion to determine that the program uses the intended frameworks/methods in the homework description (i.e. APIs are not used to make the work easier, specific CUDA libraries/strategies which we want you to use are used, etc.), and to ensure that the writeup aligns with the program itself and displays sufficient knowledge of the methods used.
We will run the autograding portion of the homework grading at 11:59pm on each of the three days before the submission deadline. If you submit your program earlier, this will give you the opportunity to see what your autograded score would be and to debug accordingly.
For the project, you will work in groups of 3-4 in order to parallelize some real world programs. For this, think early about who you would like to work with and start thinking about projects you would like to parallelize. This project should represent a program which is difficult to parallelize or will have high impact when parallelized (i.e. an open source project which has slow, serial parts). I will be discussing this project on a regular basis in lecture regarding updates and will be meeting with each group individually 1-2 times during the semester to ensure that the project is on track.
The grades will be assigned according to the following scale, where T is the total score (out of 100) you have achieved in this course.
| A : 100 ≥ T ≥ 94 | A- : 94 > T ≥ 88 | B+ : 88 > T ≥ 82 |
| B : 82 > T ≥ 77 | B- : 77 > T ≥ 72 | C+ : 72 > T ≥ 65 |
| C : 65 > T ≥ 60 | C- : 60 > T ≥ 55 | D+ : 55 > T ≥ 50 |
| D : 50 > T ≥ 40 | F : 40 > T |
Late policy for deliverables
For the homeworks, a late penalty of 2.5% will be incurred for every 3 hours the assignment is past due. This more fractional policy is used as we know many students will likely be submitting their work the night of. This will still incur a penalty but it will be more minor. A full day late will result in a 20% penalty, 2 days 40%, etc. Refer to the below equation for determining the exact percentage deducted from your final grade based on how late your assignment is.
Percentage Deducted = Math.ceil(# hours since due time / 3) * 2.5
Late projects will not be accepted for grades unless under extenuating circumstances made clear in advance.
Schedule
| Date | Lectures, Supplementals & Due Dates | Readings |
| Sep 3 |
Class Overview |
|
| Sep 8 |
Parallel Architectures |
Grama, Chapters 2.1-2.3 |
| Sep 10 |
Parallel Architectures (Cont'd) |
Grama, Chapter 2.4 |
| Sep 15 |
Design of Parallel Algorithms [Supplemental] Server Test Description [Supplemental] Server Test Code |
Grama, Chapters 3.1-3.3 |
| Sep 17 |
From Tasks to Processors |
Grama, Chapters 2.5-2.7, 3.4-3.5 |
| Sep 22 |
Mapping (From Sep 17); Threading |
Grama, Chapters 2.7, 3.5, 7.1-7.2 |
| Sep 24 |
Threads to OpenMP |
Grama, Chapters 7.1-7.5, 7.10.1 |
| Sep 29 |
OpenMP in Depth HW1 out --> Due: Oct 15 (Canvas Link here) |
Grama Chapter 7.10.1 |
| Oct 1 |
Basic Communication Operations |
Grama, Chapter 4.1-4.4 |
| Oct 6 |
Introduction to MPI |
Grama, Chapter 6.1.-6.3 |
| Oct 8 |
MPI in Practice |
Grama Chapters 6.3-6.4 |
| Oct 10 |
HW1 Autograder
|
|
| Oct 13 |
MPI Collective Communications Group Formation Due --> Oct 19 (Canvas Link here) |
Grama, Chapters 6.5-6.6 |
| Oct 15 |
MPI Examples |
Grama, Chapters 6.6-6.7 |
| Oct 20 |
MPI Examples (Cont'd) HW2 out --> Due: Nov 2 (Canvas Link here) |
Grama, Chapters 6.6-6.7 |
| Oct 22 |
MPI Examples (Cont'd) |
Grama, Chapters 6.6-6.7 |
| Oct 27 |
Analytical Modeling |
Grama Chapters 5.1-5.3 |
| Oct 29 |
Advanced Modeling |
Grama Chapters 5.4-5.7 |
| Nov 3 |
Introduction to CUDA |
Hwu Chapters 1-3 |
| Nov 10 |
CUDA Compute Architecture |
Hwu Chapter 4 |
| Nov 12 |
CUDA Memory Architecture |
Hwu Chapter 5 |
| Nov 17 |
Additional Performance Considerations HW3 out --> Due: Nov 30 (Canvas Link here) Project Planning Meeting --> Due: Nov 26 (Canvas Link here) |
Hwu Chapter 6 |
| Nov 19 |
CUDA Worklog |
Simon Boehm Worklog |
| Nov 24 |
CUDA Worklog (cont'd) HW4 out --> Due: Dec 8 (Canvas Link here) |
Simon Boehm Worklog |
| Dec 1 |
Convolutions in Cuda HW5 out --> Due: Dec 21 (Canvas Link here) |
Hwu Chapter 7 |
| Dec 3 |
Histograms in Cuda |
Hwu Chapter 9 |
| Dec 8 |
Reduction in CUDA |
Hwu Chapter 10 |
| Dec 10 |
Practical Extensions Project Report --> Due: Dec 18 (Canvas Link here) |
Homework Assignments
This section contains the homework assignments for this course. If assignments below have no links or descriptions, they have not yet been released.
Here are homework assignments with dues:
Project Details
Full details about the project can be found here.
This document contains expectations surrounding the project and what grading will entail. Read this document in full to get an idea as to what your expectations will be within your group over the next month.
Prerequisites
This course assumes that you will be comfortable with C syntax, debugging, and algorithms. This is not an introductory course in programming, but in applications of programming to the parallel setting. We assume that you will be able to incorporate new frameworks and their core ideas quickly. We will not be teaching the basics of C programming before diving into the work. Nor will we be focusing on the exact workings of some of the algorithms.