Skip to content

Commit 96bf5a3

Browse files
committed
transferred repo contents from other account
1 parent 66b86e8 commit 96bf5a3

31 files changed

+3564
-1
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*.pgm
2+
.vscode
3+
a.out

Makefile

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
CC = nvcc
2+
CFLAGS = -g -G
3+
4+
main: main.cu
5+
$(CC) $(CFLAGS) shape.cu dataset.cu cross_entropy.cu linear_layer.cu relu.cu softmax.cu nn_model.cu main.cu -o main
6+
7+
sequential:sequential.cpp
8+
g++ -o sequential sequential.cpp
9+
10+
clean:
11+
-rm main sequential

Picture1.png

57.8 KB
Loading

Picture2.png

50.8 KB
Loading

README.md

Lines changed: 101 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,101 @@
1-
# cuda_neural_nets
1+
# cuda_neural_nets
2+
3+
This was done as part of an assignment in the CS 547 - High Performance Computing course under [Prof. Kenneth Chiu](https://www.binghamton.edu/computer-science/people/profile.html?id=kchiu).
4+
5+
<span style="color:red">
6+
7+
_The work done here may only be used as reference material. The work here is not to be submitted as your own, with or without edits._
8+
9+
</span>
10+
11+
---
12+
13+
Compiling and Running the CUDA Neural Network
14+
15+
```
16+
$ make main
17+
$ ./main
18+
```
19+
20+
Compiling and Running the Sequential Neural Network
21+
22+
```
23+
$ make sequential
24+
$ ./sequential
25+
```
26+
27+
---
28+
29+
## Write-Up
30+
31+
### Different Modules
32+
33+
- The MNIST database (Modified National Institute of Standards and Technology) database is a large database of handwritten digits that is commonly used for training various image processing systems. It consists of about 60000 training images and 10000 test images along with their corresponding labels.
34+
35+
- Created a separate MNISTDataset class to deal with reading and loading the MNIST dataset as batches.
36+
37+
- Used thrust::host_vector and thrust::device_vector for representing all weights, matrices, and for transferring data between device and host.
38+
39+
- Used raw_pointer_cast to cast device_vector into float* for __global__ functions as they work only on primitive float arrays and not on the thrust device vectors.
40+
41+
- The Shape class is used to store the dimension of any matrix used.
42+
43+
- NNLayer class acts as a base class for all layer classes in the project. Similar to something like nn.Module in Pytorch.
44+
45+
- I have not explicitly constructed an input layer. Instead the Dataset class provides the batches in the required format i.e the images are flattened into a single 1-D array.
46+
47+
- Linear layer is the implementation for a fully connected layer. Weights are randomly initialized during the object construction.
48+
49+
- The forward, backprop, and update_weights_bias methods are overidden in this class.
50+
- In the forward method, the forward pass is processed, computing the expression Z = W.A + b.
51+
- linearLayerForward is the global method that is used to compute the Z. A 2-D threads grid is created. The Y thread's index computes the row index while the X thread's index helps to compute the column index of the result matrix.
52+
- In the backprop method, the backward pass is processed, calculating the downstream gradient. The kernel structure is quite similar to what is done with forward pass.
53+
54+
- Softmax layer
55+
56+
- Since it is a multi-classification problem we need to use softmax activation function to compute probabilities for all classes.
57+
- For numerical stability, the softmax inputs are normalized by taking a constant C which is the max of the inputs and subtracting it from each input and then dividing by the sum of all inputs.
58+
59+
### The final network
60+
61+
- Input Layer (Image)
62+
- Linear Layer (1024 neurons) (Dropout = 0.4)
63+
- ReLU Activation
64+
- Linear Layer (10 neurons)
65+
- Softmax Activation
66+
67+
### Accuracy
68+
69+
The accuracy of the CUDA neural network on the test set after the first epoch was 93.16% and improved to 99.11% by the second epoch. It stabilised to 99.11% after that.
70+
71+
72+
### Profiling
73+
74+
The results were computed on my local machine with the help of WSL2 (Windows Subsystem for Linux). It has a NVIDIA RTX 3060 GPU. A single step of forward and backprop was 3 times faster on the CUDA neural network as compared to the sequential implementation. The batch size used was 100 with a learning rate of 0.001.
75+
76+
#### CUDA Neural Network
77+
78+
Time per epoch: 293590056 microseconds
79+
80+
Avg Time Per Step: 4893 microseconds
81+
82+
This is a plot of the time taken by each step in the first epoch. The time duration is in microseconds. X-axis contains the step number and the time taken is denoted by the Y-axis.
83+
84+
<img src="Picture1.png" alt="cuda" width="500"/>
85+
86+
#### Sequential Neural Network
87+
88+
Time per mini-batch: 1557228 microseconds
89+
90+
Avg Time Per Step: 15339 microseconds
91+
92+
This is a plot of the time taken by each step in the first epoch. The time duration is in microseconds. X-axis contains the step number and the time taken is denoted by the Y-axis.
93+
94+
<img src="Picture2.png" alt="seq" width="500"/>
95+
96+
97+
98+
99+
100+
101+

cross_entropy.cu

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#include "cross_entropy.hh"
2+
3+
#include <math.h>
4+
#include <iostream>
5+
6+
7+
float CECost::cost(host_vec preds, int target)
8+
{
9+
float cost_value=0.0;
10+
cost_value += -log(preds[target]);
11+
12+
// std::cout << "Cost value: "<< cost_value <<std::endl;
13+
return cost_value;
14+
}
15+
16+
17+
host_vec CECost::dCost(host_vec preds, int target, host_vec dY) {
18+
19+
dY[target] = -1.0/preds[target];
20+
return dY;
21+
}

cross_entropy.hh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#pragma once
2+
#include "nn_layer.hh"
3+
4+
class CECost {
5+
public:
6+
float cost(host_vec predictions, int target);
7+
host_vec dCost(host_vec preds, int target, host_vec dY);
8+
};

dataset.cu

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
#include "dataset.hh"
2+
#include "utils.hh"
3+
#include <fstream>
4+
5+
void MNISTDataset::read_mnist_images() {
6+
7+
int rv;
8+
int fd;
9+
fd = open(fn_images.c_str(), O_RDONLY);
10+
assert(fd >= 0);
11+
12+
int magic = read_int(fd);
13+
assert(magic == 0x803);
14+
15+
int n_images = read_int(fd);
16+
17+
int n_rows = read_int(fd);
18+
assert(n_rows == 28);
19+
20+
int n_cols = read_int(fd);
21+
assert(n_cols == 28);
22+
23+
number_of_batches = n_images/batch_size;
24+
25+
for(int i=0; i<number_of_batches; i++)
26+
{
27+
thrust::host_vector<thrust::host_vector<float>> batch(batch_size);
28+
batches.push_back(batch);
29+
30+
thrust::host_vector<int> batch_targets(batch_size, 0);
31+
targets.push_back(batch_targets);
32+
}
33+
34+
int batch_idx=0;
35+
int idx=0;
36+
int batch_img_idx=0;
37+
for (int i = 0; i < n_images; i++) {
38+
if(i> 0 && i%batch_size==0)
39+
{
40+
batch_idx++;
41+
batch_img_idx = 0;
42+
}
43+
44+
unsigned char tmp[28][28];
45+
rv = read(fd, tmp, 28*28); assert(rv == 28*28);
46+
47+
idx=0;
48+
thrust::host_vector<float> temp_img(28*28, 0.0);
49+
for (int r = 0; r < 28; r++)
50+
{
51+
for (int c = 0; c < 28; c++)
52+
{
53+
// Make go from -1 to 1.
54+
temp_img[idx] = float(tmp[r][c])/127.5 - 1;
55+
idx++;
56+
}
57+
}
58+
batches[batch_idx][batch_img_idx] = temp_img;
59+
}
60+
61+
rv = close(fd); assert(rv == 0);
62+
}
63+
64+
void MNISTDataset::read_mnist_labels() {
65+
66+
int rv;
67+
int fd;
68+
fd = open(fn_labels.c_str(), O_RDONLY);
69+
assert(fd >= 0);
70+
71+
int magic = read_int(fd);
72+
assert(magic == 0x801);
73+
74+
int n_labels = read_int(fd);
75+
unsigned char labels[n_labels];
76+
77+
rv = read(fd, labels, n_labels); assert(rv == n_labels);
78+
79+
int batch_idx=0;
80+
int idx=0;
81+
for (int i = 0; i < n_labels; i++) {
82+
assert(labels[i] >= 0 && labels[i] <= 9);
83+
84+
if(i> 0 && i%batch_size==0)
85+
{
86+
batch_idx++;
87+
idx=0;
88+
}
89+
90+
targets[batch_idx][idx] = static_cast<int>(labels[i]);
91+
}
92+
std::cout<< n_labels<<std::endl;
93+
std::cout<< targets[0].size()<<std::endl;
94+
95+
rv = close(fd); assert(rv == 0);
96+
}
97+
98+
MNISTDataset::MNISTDataset(size_t batch_size, const std::string &fn_images, const std::string &fn_labels):
99+
batch_size(batch_size), fn_images(fn_images), fn_labels(fn_labels)
100+
{
101+
read_mnist_images();
102+
read_mnist_labels();
103+
}
104+
105+
int MNISTDataset::getNumOfBatches() {
106+
return number_of_batches;
107+
}
108+
109+
thrust::host_vector<thrust::host_vector<thrust::host_vector<float>>>& MNISTDataset::getBatches() {
110+
return batches;
111+
}
112+
113+
thrust::host_vector<thrust::host_vector<int>>& MNISTDataset::getTargets() {
114+
return targets;
115+
}
116+
117+
void output_pgm(const std::string &fn, thrust::host_vector<float>& img) {
118+
119+
std::ofstream ofs(fn, std::fstream::out|std::fstream::trunc);
120+
121+
ofs << "P2\n";
122+
ofs << "28 28\n";
123+
ofs << "255\n";
124+
int idx=0;
125+
for (int i = 0; i < 28; i++) {
126+
for (int j = 0; j < 28; j++) {
127+
if (j > 0) {
128+
ofs << " ";
129+
}
130+
ofs << 255 - int(std::round(127.5*(img[idx] + 1)));
131+
idx++;
132+
}
133+
ofs << "\n";
134+
}
135+
}
136+
137+
/* testing */
138+
// int main()
139+
// {
140+
// MNISTDataset data_obj(1, "mnist/train-images-idx3-ubyte", "mnist/train-labels-idx1-ubyte");
141+
// auto batches = data_obj.getBatches();
142+
// output_pgm("img0.pgm", batches[0]);
143+
// // output_pgm("img1.pgm", batches[5][4]);
144+
145+
// return 0;
146+
// }

dataset.hh

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#pragma once
2+
3+
#include <string>
4+
#include <thrust/host_vector.h>
5+
6+
class MNISTDataset {
7+
private:
8+
size_t batch_size;
9+
size_t number_of_batches;
10+
std::string fn_images, fn_labels;
11+
12+
thrust::host_vector<thrust::host_vector<thrust::host_vector<float>>> batches;
13+
thrust::host_vector<thrust::host_vector<int>> targets;
14+
15+
void read_mnist_labels();
16+
void read_mnist_images();
17+
18+
public:
19+
20+
MNISTDataset(size_t batch_size, const std::string &fn_images, const std::string &fn_labels);
21+
22+
int getNumOfBatches();
23+
thrust::host_vector<thrust::host_vector<thrust::host_vector<float>>>& getBatches();
24+
thrust::host_vector<thrust::host_vector<int>>& getTargets();
25+
};

0 commit comments

Comments
 (0)