top of page
Search

# CS 4370/6370 Fall 2022 Project # 1 – Matrix Addition and Matrix Multiplication Solution using CUDA

Updated: Sep 17

CS 4370/6370 Fall 2022 Project # 1 – Matrix Addition and Matrix Multiplication

For this task, you will develop a complete CUDA program for integer matrix addition. You will add two two-dimensional matrices A, B on the device GPU in parallel. After the device matrix addition kernel function is invoked, and the addition result will be transferred back to the CPU. Your program will also compute the sum matrix of matrices A and B using the CPU. Your program should compare the device-computed result with the CPU-computed result. If it matches, then it will print out "Test PASSED" to the screen before exiting.

The pseudo code for matrix addition on the CPU is as follows:

void add_matrix_cpu(int *a, int *b, int *c, int N)

{

int i, j, index;

for (i=0;i<N;i++) {

for (j=0;j<N;j++) {

index =i*N+j;

c[index]=a[index]+b[index];

}

}

}

void main() {

.....

}

The pseudo code for matrix addition on the GPU device is as follows:

CUDA C program

__global__ void add_matrix_gpu(int *a, int *b, int *c, intN)

{

int index =row*N+col;

if( row<N && col <N)

c[index]=a[index]+b[index];

}

void main() {

dim3 dimBlock(blocksize, blocksize,1);

dim3 dimGrid( ceiling (double (N) /dimBlock.x), ceiling (double (N) /dimBlock.y), 1 );

}

Use the following pseudo code for matrix initialization.

int *a, *b, *c;

A=malloc(sizeof(int)*N*N; //N is the size

//then malloc for b and c

Int init =1325;

For (i=0;i<N;i++){

For (j=0;j<N;j++){

Init=3125*init%65536;

A[i][j]=(init-32768)/6553;

B[i][j]=Init%1000;

}

}

Use the following matrix size and thread block size (the number of threads in each block) to test your cuda program.

Matrix Size Size of Thread block

8*8 4*4 (For debugging purpose)

128*128 16*16

500*500 16*16

1000*1000 16*16

For this task, you will develop a complete CUDA program for matrix multiplication. You will multiply two two-dimensional matrices A,B on the device GPU in paralell. After the device matrix multiplication kernel function is invoked, and the multiplication result will be transferred back to the CPU. Your program will also compute the product matrix of matrices A and B using the CPU. Your program should compare the device-computed result with the CPU-computed result. If it matches, then it will print out "Test PASSED" to the screen before exiting.

The pseudo code for matrix multiplication on the CPU is as follows:

void MatrixMulOnHost(int* M, int* N, int* P, int Width)‏

{

for (int i = 0; i < Width; ++i)‏

for (int j = 0; j < Width; ++j) {

int sum = 0;

for (int k = 0; k < Width; ++k) {

int a = M[i * Width + k];

int b = N[k * Width + j];

sum += a * b;

}

P[i * Width + j] = sum;

}

}

void main() {

.....

}

The pseudo code for matrix addition on the GPU device is as follows:

CUDA C program

__global__ void MatrixMulKernel(int* M, int* N, int * P, int Width)

{

if ((Row < Width) && (Col < Width)) {

int Pvalue = 0;

for (int k = 0; k < Width; ++k)

Pvalue += M[Row*Width+k] * N[k*Width+Col];

d_P[Row*Width+Col] = Pvalue;

}

}

void main() {

dim3 dimBlock(blocksize,blocksize,1);

dim3 dimGrid( ceiling (double (N) /dimBlock.x), ceiling (double (N) /dimBlock.y), 1 );

}

Use the following pseudo code for matrix initialization.

int *a, *b, *c;

A=malloc(sizeof(int)*N*N; //N is the size

//then malloc for b and c

Int init =1325;

For (i=0;i<N;i++){

For (j=0;j<N;j++){

Init=3125*init%65536;

A[i,j]=(init-32768)/6553;

B[i,j]=Init%1000;

}

}

Use the following matrix size and thread block size (the number of threads in each block).

Matrix Size

8*8 4*4 (For debugging purpose)

128*128 16*16

500*500 16*16

1024*1024 16*16

Requirements:

2. You must submit an ELECTRONIC COPY of your source program through Pilot before the due date. If for some reason Pilot is unavailable, submit your source code by email to meilin.liu@wright.edu.

4. The grader or the instructor will test your programs under CUDA environment, on the linux server, fry.cs.wright.edu. Before you submit your program, please connect to this server using your campus ID to test your program (I have demoed how to compile and execute a cuda program on this server. If you have questions, let me know).

5. The programming assignment is individual. You must finish the project by yourself. If you allow others to copy your programs or answers, you will get the same punishment as those who copy yours.

How to use CUDA on fry.cs.wright.edu

First using putty or other secure shell clients to connect to fry.cs.wright.edu using your campus id (for example, w123abc), then run the following command:

srun -p a100 --gres=gpu:1 --pty bash

This command will request access to a gpu node and launch a bash shell on it.

Then you can compile a cuda program vectadd.cu using the following command under the directory where your source cuda program is located.

Then you can execute vectadd using the following command under the directory where the generated executable file (of your cuda source program), vectadd, is located.

We will happy to assist You:

Please call at : +91-995 3141 035

OR Leave a WhatsApp message at : +91 - 995 3141 035 (For quick response)

Solution Includes: AI writing Detection and Plagiarism report with 100% Accuracy. 