CS 4370/6370 Fall 2022 Program Assignment # 2 – Tiled Matrix Multiplication Solution in CUDA

CS 4370/6370 Fall 2022 Program Assignment # 2

1. The objective

The objective of this programming assignment is to implement tiled matrix multiplication, and get a better understanding of shared memory.

2. Submission

A team can have up to 3 students. All students in the same team will receive the same grade.

a. Each team only submits one copy of your programming assignment, including cuda source program, readme, and a report.

b. Each team member needs to submit the list of the names of all team members.

3. Project description: Tiled Matrix Multiplication

In this project, you will develop a complete CUDA program for tiled matrix multiplication. You will multiply two two-dimensional matrices A,B on the device GPU. After the device matrix multiplication is invoked, your program will also compute the correct solution matrix using the CPU, and compare that solution with the device-computed solution. If it matches (within a certain tolerance, i.e., 0.000001), then it will print out "Test PASSED" to the screen before exiting.

The pseudo code for matrix multiplication on the CPU is as follows:

void MatrixMulOnHost(float* M, float* N, float* P, int Width)‏


for (int row = 0; row < Width; ++row)‏

for (int col = 0; col < Width; ++col) {

double sum = 0;

for (int k = 0; k < Width; ++k) {

float a = M[row * Width + k];

float b = N[k * Width + col];