mohan-aditya05
diff --git a/‎.gitignore
Lines changed: 3 additions & 0 deletions b/‎.gitignore
Lines changed: 3 additions & 0 deletions
diff --git a/‎Makefile
Lines changed: 11 additions & 0 deletions b/‎Makefile
Lines changed: 11 additions & 0 deletions
diff --git a/‎Picture1.png
57.8 KB b/‎Picture1.png
57.8 KB
diff --git a/‎Picture2.png
50.8 KB b/‎Picture2.png
50.8 KB
diff --git a/‎README.md
Lines changed: 101 additions & 1 deletion b/‎README.md
Lines changed: 101 additions & 1 deletion
diff --git a/‎cross_entropy.cu
Lines changed: 21 additions & 0 deletions b/‎cross_entropy.cu
Lines changed: 21 additions & 0 deletions
diff --git a/‎cross_entropy.hh
Lines changed: 8 additions & 0 deletions b/‎cross_entropy.hh
Lines changed: 8 additions & 0 deletions
diff --git a/‎dataset.cu
Lines changed: 146 additions & 0 deletions b/‎dataset.cu
Lines changed: 146 additions & 0 deletions
diff --git a/‎dataset.hh
Lines changed: 25 additions & 0 deletions b/‎dataset.hh
Lines changed: 25 additions & 0 deletions
@@ -0,0 +1,3 @@
+*.pgm
+.vscode
+a.out
@@ -0,0 +1,11 @@
+CC = nvcc
+CFLAGS = -g -G
+
+main: main.cu
+	$(CC) $(CFLAGS) shape.cu dataset.cu cross_entropy.cu linear_layer.cu relu.cu softmax.cu nn_model.cu main.cu -o main
+
+sequential:sequential.cpp
+	g++ -o sequential sequential.cpp
+
+clean:
+	-rm main sequential
@@ -1 +1,101 @@
-# cuda_neural_nets
+# cuda_neural_nets
+
+This was done as part of an assignment in the CS 547 - High Performance Computing course under [Prof. Kenneth Chiu](https://www.binghamton.edu/computer-science/people/profile.html?id=kchiu). 
+
+<span style="color:red">
+
+_The work done here may only be used as reference material. The work here is not to be submitted as your own, with or without edits._ 
+
+</span>
+
+---
+
+Compiling and Running the CUDA Neural Network
+
+```
+$ make main
+$ ./main 
+```
+
+Compiling and Running the Sequential Neural Network
+
+```
+$ make sequential
+$ ./sequential 
+```
+
+---
+
+## Write-Up
+
+### Different Modules
+
+- The MNIST database (Modified National Institute of Standards and Technology) database is a large database of handwritten digits that is commonly used for training various image processing systems. It consists of about 60000 training images and 10000 test images along with their corresponding labels.
+
+- Created a separate MNISTDataset class to deal with reading and loading the MNIST dataset as batches.
+
+- Used thrust::host_vector and thrust::device_vector for representing all weights, matrices, and for transferring data between device and host. 
+
+- Used raw_pointer_cast to cast device_vector into float* for __global__ functions as they work only on primitive float arrays and not on the thrust device vectors.
+
+- The Shape class is used to store the dimension of any matrix used.
+
+- NNLayer class acts as a base class for all layer classes in the project. Similar to something like nn.Module in Pytorch.
+
+- I have not explicitly constructed an input layer. Instead the Dataset class provides the batches in the required format i.e the images are flattened into a single 1-D array.
+
+- Linear layer is the implementation for a fully connected layer. Weights are randomly initialized during the object construction. 
+
+    - The forward, backprop, and update_weights_bias methods are overidden in this class. 
+    - In the forward method, the forward pass is processed, computing the expression Z = W.A + b. 
+    - linearLayerForward is the global method that is used to compute the Z. A 2-D threads grid is created. The Y thread's index computes the row index while the X thread's index helps to compute the column index of the result matrix. 
+    - In the backprop method, the backward pass is processed, calculating the downstream gradient. The kernel structure is quite similar to what is done with forward pass.
+
+- Softmax layer
+
+    - Since it is a multi-classification problem we need to use softmax activation function to compute probabilities for all classes. 
+    - For numerical stability, the softmax inputs are normalized by taking a constant C which is the max of the inputs and subtracting it from each input and then dividing by the sum of all inputs. 
+
+### The final network
+
+- Input Layer (Image)
+- Linear Layer (1024 neurons) (Dropout = 0.4)
+- ReLU Activation
+- Linear Layer (10 neurons) 
+- Softmax Activation
+
+### Accuracy
+
+The accuracy of the CUDA neural network on the test set after the first epoch was 93.16% and improved to 99.11% by the second epoch. It stabilised to 99.11% after that.
+
+
+### Profiling
+
+The results were computed on my local machine with the help of WSL2 (Windows Subsystem for Linux). It has a NVIDIA RTX 3060 GPU. A single step of forward and backprop was 3 times faster on the CUDA neural network as compared to the sequential implementation. The batch size used was 100 with a learning rate of 0.001. 
+
+#### CUDA Neural Network
+
+Time per epoch: 293590056 microseconds
+
+Avg Time Per Step: 4893 microseconds
+
+This is a plot of the time taken by each step in the first epoch. The time duration is in microseconds. X-axis contains the step number and the time taken is denoted by the Y-axis. 
+
+<img src="Picture1.png" alt="cuda" width="500"/>
+
+#### Sequential Neural Network
+
+Time per mini-batch: 1557228 microseconds
+
+Avg Time Per Step: 15339 microseconds
+
+This is a plot of the time taken by each step in the first epoch. The time duration is in microseconds. X-axis contains the step number and the time taken is denoted by the Y-axis. 
+
+<img src="Picture2.png" alt="seq" width="500"/>
+
+
+
+ 
+
+
+
@@ -0,0 +1,21 @@
+#include "cross_entropy.hh"
+
+#include <math.h>
+#include <iostream>
+
+
+float CECost::cost(host_vec preds, int target)
+{
+    float cost_value=0.0;
+	cost_value += -log(preds[target]);
+
+	// std::cout << "Cost value: "<< cost_value <<std::endl;
+    return cost_value;
+}
+
+
+host_vec CECost::dCost(host_vec preds, int target, host_vec dY) {
+	
+	dY[target] = -1.0/preds[target];
+	return dY;
+}
@@ -0,0 +1,8 @@
+#pragma once
+#include "nn_layer.hh"
+
+class CECost {
+public:
+	float cost(host_vec predictions, int target);
+	host_vec dCost(host_vec preds, int target, host_vec dY);
+};
@@ -0,0 +1,146 @@
+#include "dataset.hh"
+#include "utils.hh"
+#include <fstream>
+
+void MNISTDataset::read_mnist_images() {
+
+    int rv;
+    int fd;
+    fd = open(fn_images.c_str(), O_RDONLY);
+    assert(fd >= 0);
+
+    int magic = read_int(fd);
+    assert(magic == 0x803);
+
+    int n_images = read_int(fd);
+
+    int n_rows = read_int(fd);
+    assert(n_rows == 28);
+
+    int n_cols = read_int(fd);
+    assert(n_cols == 28);
+
+    number_of_batches = n_images/batch_size;
+
+    for(int i=0; i<number_of_batches; i++)
+    {
+        thrust::host_vector<thrust::host_vector<float>> batch(batch_size);
+        batches.push_back(batch);
+
+        thrust::host_vector<int> batch_targets(batch_size, 0);
+        targets.push_back(batch_targets);
+    }
+
+    int batch_idx=0;
+    int idx=0;
+    int batch_img_idx=0;
+    for (int i = 0; i < n_images; i++) {
+        if(i> 0 && i%batch_size==0)
+        {
+            batch_idx++;
+            batch_img_idx = 0;
+        }
+
+        unsigned char tmp[28][28];
+        rv = read(fd, tmp, 28*28); assert(rv == 28*28);
+        
+        idx=0;
+        thrust::host_vector<float> temp_img(28*28, 0.0);
+        for (int r = 0; r < 28; r++) 
+        {
+            for (int c = 0; c < 28; c++) 
+            {
+                // Make go from -1 to 1.
+                temp_img[idx] = float(tmp[r][c])/127.5 - 1;
+                idx++;
+            }
+        }
+        batches[batch_idx][batch_img_idx] = temp_img;
+    }
+
+    rv = close(fd); assert(rv == 0);
+}
+
+void MNISTDataset::read_mnist_labels() {
+
+    int rv;
+    int fd;
+    fd = open(fn_labels.c_str(), O_RDONLY);
+    assert(fd >= 0);
+
+    int magic = read_int(fd);
+    assert(magic == 0x801);
+
+    int n_labels = read_int(fd);
+    unsigned char labels[n_labels];
+
+    rv = read(fd, labels, n_labels); assert(rv == n_labels);
+
+    int batch_idx=0;
+    int idx=0;
+    for (int i = 0; i < n_labels; i++) {
+        assert(labels[i] >= 0 && labels[i] <= 9);
+
+        if(i> 0 && i%batch_size==0)
+        {
+            batch_idx++;
+            idx=0;
+        }
+        
+        targets[batch_idx][idx] = static_cast<int>(labels[i]);
+    }
+    std::cout<< n_labels<<std::endl;
+    std::cout<< targets[0].size()<<std::endl;
+
+    rv = close(fd); assert(rv == 0);
+}
+
+MNISTDataset::MNISTDataset(size_t batch_size, const std::string &fn_images, const std::string &fn_labels):
+	batch_size(batch_size), fn_images(fn_images), fn_labels(fn_labels)
+{
+    read_mnist_images();
+    read_mnist_labels();
+}
+
+int MNISTDataset::getNumOfBatches() {
+	return number_of_batches;
+}
+
+thrust::host_vector<thrust::host_vector<thrust::host_vector<float>>>& MNISTDataset::getBatches() {
+	return batches;
+}
+
+thrust::host_vector<thrust::host_vector<int>>& MNISTDataset::getTargets() {
+	return targets;
+}
+
+void output_pgm(const std::string &fn, thrust::host_vector<float>& img) {
+
+    std::ofstream ofs(fn, std::fstream::out|std::fstream::trunc);
+
+    ofs << "P2\n";
+    ofs << "28 28\n";
+    ofs << "255\n";
+    int idx=0;
+    for (int i = 0; i < 28; i++) {
+        for (int j = 0; j < 28; j++) {
+            if (j > 0) {
+                ofs << " ";
+            }
+            ofs << 255 - int(std::round(127.5*(img[idx] + 1)));
+            idx++;
+        }
+        ofs << "\n";
+    }
+}
+
+/* testing */
+// int main()
+// {
+//     MNISTDataset data_obj(1, "mnist/train-images-idx3-ubyte", "mnist/train-labels-idx1-ubyte");
+//     auto batches = data_obj.getBatches();
+//     output_pgm("img0.pgm", batches[0]);
+//     // output_pgm("img1.pgm", batches[5][4]);
+
+//     return 0;
+// }
@@ -0,0 +1,25 @@
+#pragma once
+
+#include <string>
+#include <thrust/host_vector.h>
+
+class MNISTDataset {
+private:
+	size_t batch_size;
+	size_t number_of_batches;
+    std::string fn_images, fn_labels;
+
+	thrust::host_vector<thrust::host_vector<thrust::host_vector<float>>> batches;
+	thrust::host_vector<thrust::host_vector<int>> targets;
+
+    void read_mnist_labels();
+    void read_mnist_images();
+
+public:
+
+	MNISTDataset(size_t batch_size, const std::string &fn_images, const std::string &fn_labels);
+
+	int getNumOfBatches();
+	thrust::host_vector<thrust::host_vector<thrust::host_vector<float>>>& getBatches();
+	thrust::host_vector<thrust::host_vector<int>>& getTargets();
+};