CS 4370/6370 Project # 1 – Matrix Addition and Matrix Multiplication Solution using CUDA
- R K Gaur

- Jul 11
- 6 min read
Updated: Sep 4
CS 4370/6370 Project #1 – Matrix Addition and Matrix Multiplication
Introduction to Matrix Operations
In this project, I will guide you through the implementation of matrix addition and multiplication using CUDA. These operations are fundamental in many scientific and engineering applications. By leveraging the power of GPUs, we can perform these operations in parallel, significantly improving performance.
Task 1: Basic Matrix Addition
For this task, you will develop a complete CUDA program for integer matrix addition. You will add two two-dimensional matrices, A and B, on the device GPU in parallel. After invoking the device matrix addition kernel function, the result will be transferred back to the CPU. Your program will also compute the sum matrix of matrices A and B using the CPU. Finally, it will compare the device-computed result with the CPU-computed result. If they match, the program will print "Test PASSED" to the screen before exiting.
Pseudo Code for Matrix Addition on the CPU
Here’s how you can implement matrix addition on the CPU:
```c
void add_matrix_cpu(int a, int b, int *c, int N) {
int i, j, index;
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
index = i * N + j;
c[index] = a[index] + b[index];
}
}
}
void main() {
// Your code here
add_matrix(a, b, c, N);
}
```
Pseudo Code for Matrix Addition on the GPU
The following is the CUDA C program for matrix addition on the GPU:
```c
__global__ void add_matrix_gpu(int a, int b, int *c, int N) {
int col = blockIdx.x * blockDim.x + threadIdx.x;
int row = blockIdx.y * blockDim.y + threadIdx.y;
int index = row * N + col;
if (row < N && col < N)
c[index] = a[index] + b[index];
}
void main() {
dim3 dimBlock(blocksize, blocksize, 1);
dim3 dimGrid(ceiling(double(N) / dimBlock.x), ceiling(double(N) / dimBlock.y), 1);
add_matrix_gpu<<<dimGrid, dimBlock>>>(a, b, c, N);
}
```
Matrix Initialization
Use the following pseudo code for matrix initialization:
```c
int a, b, *c;
a = malloc(sizeof(int) N N); // N is the size
// then malloc for b and c
int init = 1325;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
init = 3125 * init % 65536;
a[i * N + j] = (init - 32768) / 6553;
b[i * N + j] = init % 1000;
}
}
```
Matrix Size and Thread Block Size
Use the following matrix sizes and thread block sizes (the number of threads in each block) to test your CUDA program:
| Matrix Size | Size of Thread Block |
|-------------|----------------------|
| 8x8 | 4x4 (For debugging purpose) |
| 128x128 | 16x16 |
| 500x500 | 16x16 |
| 1000x1000 | 16x16 |
Task 2: Matrix Multiplication
In this task, you will develop a complete CUDA program for matrix multiplication. You will multiply two two-dimensional matrices, A and B, on the device GPU in parallel. After invoking the device matrix multiplication kernel function, the result will be transferred back to the CPU. Your program will also compute the product matrix of matrices A and B using the CPU. Finally, it will compare the device-computed result with the CPU-computed result. If they match, the program will print "Test PASSED" to the screen before exiting.
Pseudo Code for Matrix Multiplication on the CPU
Here’s how you can implement matrix multiplication on the CPU:
```c
void MatrixMulOnHost(int M, int N, int* P, int Width) {
for (int i = 0; i < Width; ++i) {
for (int j = 0; j < Width; ++j) {
int sum = 0;
for (int k = 0; k < Width; ++k) {
int a = M[i * Width + k];
int b = N[k * Width + j];
sum += a * b;
}
P[i * Width + j] = sum;
}
}
}
void main() {
// Your code here
MatrixMulOnHost(a, b, c, N);
}
```
Pseudo Code for Matrix Multiplication on the GPU
The following is the CUDA C program for matrix multiplication on the GPU:
```c
__global__ void MatrixMulKernel(int M, int N, int * P, int Width) {
int Row = blockIdx.y * blockDim.y + threadIdx.y;
int Col = blockIdx.x * blockDim.x + threadIdx.x;
if ((Row < Width) && (Col < Width)) {
int Pvalue = 0;
for (int k = 0; k < Width; ++k) {
Pvalue += M[Row Width + k] N[k * Width + Col];
}
P[Row * Width + Col] = Pvalue;
}
}
void main() {
dim3 dimBlock(blocksize, blocksize, 1);
dim3 dimGrid(ceiling(double(N) / dimBlock.x), ceiling(double(N) / dimBlock.y), 1);
MatrixMulKernel<<<dimGrid, dimBlock>>>(a, b, c, N);
}
```
Matrix Initialization for Multiplication
Use the following pseudo code for matrix initialization:
```c
int a, b, *c;
a = malloc(sizeof(int) N N); // N is the size
// then malloc for b and c
int init = 1325;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
init = 3125 * init % 65536;
a[i * N + j] = (init - 32768) / 6553;
b[i * N + j] = init % 1000;
}
}
```
Matrix Size and Thread Block Size for Multiplication
Use the following matrix sizes and thread block sizes (the number of threads in each block):
| Matrix Size | Size of Thread Block |
|-------------|----------------------|
| 8x8 | 4x4 (For debugging purpose) |
| 128x128 | 16x16 |
| 500x500 | 16x16 |
| 1024x1024 | 16x16 |
Requirements
To use the CUDA compiler environment installed under the CS Unix server, fry.cs.wright.edu, connect to this server remotely using a secure shell client, such as PuTTY. You can connect to this server on campus from a Wright State computer or use your own laptop connected to the WSU Wi-Fi network named “WSU-Secure”. Note that you cannot connect remotely using SSH from outside Wright State University without installing a VPN or using the campus “WSU_EZ_CONNECT” Wi-Fi network. If you want to connect to this server remotely off-campus, you need to install a VPN on your computer first. If you want to edit your CUDA source programs under Windows, download Notepad++. After editing your source programs, use a secure file transfer client (WinSCP) to transfer your CUDA source programs to fry.cs.wright.edu.
You must submit an ELECTRONIC COPY of your source program through Pilot before the due date. If Pilot is unavailable, submit your source code by email to meilin.liu@wright.edu.
Submit all your source codes, a README file, a report, and any other required files. In the README file, explain how to compile and run your programs clearly. In your report, state whether your programs have all the functionalities required in the project description. Clearly mention any functionalities not implemented in your program. If your program works correctly, include screenshots in your report. Your submitted file name should include your last name, for example, Liu_Project1.cpp, Liu_Project1_Report, Liu_Project1_ReadMe, etc. All submitted project files should include: Course Number / Course Title, your name, group member’s name, professor’s name, date, and the project name. If you do not include these required contents in your submitted files, then 5 points will be deducted.
The grader or the instructor will test your programs under the CUDA environment on the Linux server, fry.cs.wright.edu. Before submitting your program, connect to this server using your campus ID to test it. I have demonstrated how to compile and execute a CUDA program on this server. If you have questions, let me know.
The programming assignment is individual. You must finish the project by yourself. If you allow others to copy your programs or answers, you will receive the same punishment as those who copy yours.
How to Use CUDA on fry.cs.wright.edu
First, use PuTTY or other secure shell clients to connect to fry.cs.wright.edu using your campus ID (for example, w123abc). Then run the following command:
```bash
srun -p a100 --gres=gpu:1 --pty bash
```
This command will request access to a GPU node and launch a bash shell on it.
Then you can compile a CUDA program, vectadd.cu, using the following command under the directory where your source CUDA program is located:
```bash
nvcc vectadd.cu -o vectadd
```
Finally, execute vectadd using the following command under the directory where the generated executable file (of your CUDA source program), vectadd, is located:
```bash
./vectadd
```
Need Plagiarism-free and Manual Written Solutions with 100% Accuracy?
📞 Call or WhatsApp: +91- 995 - 314 - 1035 (For quick response)
📧 Email: javascholars@gmail.com





I got 100% in all my 5 assignments for CS 4370/6370 thanks to the CUDA project help. Also took support for a couple of other courses ended up getting A+ in all of them. R K Gaur really knows how to solve complex programming assignments.