CSCI 5451, Introduction to Parallel Programming

Fall 2025, Mondays and Wednesdays, 8:15am to 9:30am, Lind Hall 302

Course Information

This course is an introduction to parallel computing. It covers parallel architectures, parallel algorithms and their analysis. It will also introduce you to programming on parallel platforms. The main programming medium used for the labs will be in C. OpenMP, MPI, CUDA for NVIDIA Graphics Processing Units (GPUs), and (optionally) NCCL will also be covered. The course begins with a more theoretical focus [the complexity of parallel algorithms and their efficiency], then gradually shifts to a more practical focus on particular algorithms [sorting, graph algorithms, matrix algorithms] and their use in CUDA architectures.

Most course information (lectures, homework descriptions + due dates, code stubs presented during lecture, office hours, etc.) will be posted to the course website. It is your responsibility to track this website and attend lectures to see updates regarding homework assignments and their due dates+requirements. We will use canvas only for homework submission and grading.

Instructors: James Mooney
Instructor

Wenjie Zhang
Graduate TA
Class meets: Mondays and Wednesdays, 8:15am to 9:30am, Lind Hall 302
Office hours: James: Tuesday, 3:30pm to 4:30pm, Shepherd 439; Wenjie: Thursday, 4:00pm to 5:00pm, Keller Hall 6-210
Class page: https://jimtmooney.github.io/Courses/F25/index.html
Canvas: canvas.umn.edu/courses/518528

Grading, Evaluation & Late Policy

Grading

75% Homework 5 Individual Homeworks (15% each)
25% Project Team Project of 3-4 people

Evaluation

Your evaluation for this course will be based on 5 homeworks (15% each) and 1 final course project (25%).

Homeworks will be released every ~2 weeks beginning in the second week of September. Each release will consist of a pdf writeup of the assignment along with some small unit tests and a serial version of the program. Each homework submission will contain a zip file of the program itself along with a markdown file describing the solution. Homework submission will be due at 11:59 CST of the due date in Canvas.

Homework grading will consist of two portions. (1) An auto-graded portion determining that (a) the program compiles (b) the program passes a series of unit tests corresponding to the given problem and (c) the program achieves significant speedups over a serial version of the program (what constitutes significant will be stated in the homework pdf). (2) A human graded portion to determine that the program uses the intended frameworks/methods in the homework description (i.e. APIs are not used to make the work easier, specific CUDA libraries/strategies which we want you to use are used, etc.), and to ensure that the writeup aligns with the program itself and displays sufficient knowledge of the methods used.

We will run the autograding portion of the homework grading at 11:59pm on each of the three days before the submission deadline. If you submit your program earlier, this will give you the opportunity to see what your autograded score would be and to debug accordingly.

For the project, you will work in groups of 3-4 in order to parallelize some real world programs. For this, think early about who you would like to work with and start thinking about projects you would like to parallelize. This project should represent a program which is difficult to parallelize or will have high impact when parallelized (i.e. an open source project which has slow, serial parts). I will be discussing this project on a regular basis in lecture regarding updates and will be meeting with each group individually 1-2 times during the semester to ensure that the project is on track.

The grades will be assigned according to the following scale, where T is the total score (out of 100) you have achieved in this course.

A : 100 ≥ T ≥ 94	A- : 94 > T ≥ 88	B+ : 88 > T ≥ 82
B : 82 > T ≥ 77	B- : 77 > T ≥ 72	C+ : 72 > T ≥ 65
C : 65 > T ≥ 60	C- : 60 > T ≥ 55	D+ : 55 > T ≥ 50
D : 50 > T ≥ 40	F : 40 > T

Late policy for deliverables

For the homeworks, a late penalty of 2.5% will be incurred for every 3 hours the assignment is past due. This more fractional policy is used as we know many students will likely be submitting their work the night of. This will still incur a penalty but it will be more minor. A full day late will result in a 20% penalty, 2 days 40%, etc. Refer to the below equation for determining the exact percentage deducted from your final grade based on how late your assignment is.

Percentage Deducted = Math.ceil(# hours since due time / 3) * 2.5

Late projects will not be accepted for grades unless under extenuating circumstances made clear in advance.

Schedule

Date	Lectures, Supplementals & Due Dates	Readings
Sep 3	Class Overview
Sep 8	Parallel Architectures	Grama, Chapters 2.1-2.3
Sep 10	Parallel Architectures (Cont'd)	Grama, Chapter 2.4
Sep 15	Design of Parallel Algorithms [Supplemental] Server Test Description [Supplemental] Server Test Code	Grama, Chapters 3.1-3.3
Sep 17	From Tasks to Processors	Grama, Chapters 2.5-2.7, 3.4-3.5
Sep 22	Mapping (From Sep 17); Threading	Grama, Chapters 2.7, 3.5, 7.1-7.2
Sep 24	Threads to OpenMP	Grama, Chapters 7.1-7.5, 7.10.1
Sep 29	OpenMP in Depth HW1 out --> Due: Oct 15 (Canvas Link here)	Grama Chapter 7.10.1
Oct 1	Basic Communication Operations	Grama, Chapter 4.1-4.4
Oct 6	Introduction to MPI	Grama, Chapter 6.1.-6.3
Oct 8	MPI in Practice	Grama Chapters 6.3-6.4
Oct 10	HW1 Autograder
Oct 13	MPI Collective Communications Group Formation Due --> Oct 19 (Canvas Link here)	Grama, Chapters 6.5-6.6
Oct 15	MPI Examples	Grama, Chapters 6.6-6.7
Oct 20	MPI Examples (Cont'd) HW2 out --> Due: Nov 2 (Canvas Link here)	Grama, Chapters 6.6-6.7
Oct 22	MPI Examples (Cont'd)	Grama, Chapters 6.6-6.7
Oct 27	Analytical Modeling	Grama Chapters 5.1-5.3
Oct 29	Advanced Modeling	Grama Chapters 5.4-5.7
Nov 3	Introduction to CUDA	Hwu Chapters 1-3
Nov 10	CUDA Compute Architecture	Hwu Chapter 4
Nov 12	CUDA Memory Architecture	Hwu Chapter 5
Nov 17	Additional Performance Considerations HW3 out --> Due: Nov 30 (Canvas Link here) Project Planning Meeting --> Due: Nov 26 (Canvas Link here)	Hwu Chapter 6
Nov 19	CUDA Worklog	Simon Boehm Worklog
Nov 24	CUDA Worklog (cont'd) HW4 out --> Due: Dec 8 (Canvas Link here)	Simon Boehm Worklog
Dec 1	Convolutions in Cuda HW5 out --> Due: Dec 21 (Canvas Link here)	Hwu Chapter 7
Dec 3	Histograms in Cuda	Hwu Chapter 9
Dec 8	Reduction in CUDA	Hwu Chapter 10
Dec 10	Practical Extensions Project Report --> Due: Dec 18 (Canvas Link here)

Homework Assignments

This section contains the homework assignments for this course. If assignments below have no links or descriptions, they have not yet been released.

Here are homework assignments with dues:

HW1 --> Due: Oct 15 (Canvas Link here)
- Homework PDF (Containing Instructions for Submission & Rubric)
- Homework Code Stubs (Containing Unit Tests & initial serial executable)
- Autograder (For running test on plate yourself)
HW2 --> Due: Nov 2 (Canvas Link here)
- Homework Descriptions (Containing Instructions for Homework)
- Homework Code Stubs (Containing initial code, unit tests, & autograder)
HW3 --> Due: Nov 30 (Canvas Link here)
- Homework Descriptions (Containing Instructions for Homework)
- Homework Code Stubs (Containing initial code, unit tests, & autograder)
- Block Cyclic Figure Diagrams
HW4 --> Due: Dec 8 (Canvas Link here)
- Homework Descriptions (Containing Instructions for Homework)
- Homework Code Stubs (.ipynb file + homework description + .cu file)
HW5 --> Due: Dec 21 (Canvas Link here)
- Homework Descriptions (Containing Instructions for Homework)
- Homework Code Stubs (homework description + .cu files + autograder)

Project Details

Full details about the project can be found here.

This document contains expectations surrounding the project and what grading will entail. Read this document in full to get an idea as to what your expectations will be within your group over the next month.

Group Formation --> Due: Oct 19 (Canvas Link here)
Project Proposal--> Due: Nov 26 (Canvas Link here)
Project Final Report & Code--> Due: Dec 18 (Canvas Link here)

Prerequisites

This course assumes that you will be comfortable with C syntax, debugging, and algorithms. This is not an introductory course in programming, but in applications of programming to the parallel setting. We assume that you will be able to incorporate new frameworks and their core ideas quickly. We will not be teaching the basics of C programming before diving into the work. Nor will we be focusing on the exact workings of some of the algorithms.