Skip to end of metadata
Go to start of metadata

Project Structure:

Simple CNN training: 

a network with two convolution layers with 32 3x3 filters, a max pooling layer with a 2x2 window, a dropout layer with a 0.25 dropout rate, a flatten layer, a dense layer with 128 nodes and relu activation, a second dropout layer with a 0.5 dropout rate, and a final dense layer for the output with 10 nodes and the softmax activation function. When compiling the model, use categorical cross-entropy as the loss function and adam as the optimizer. The metric should be accuracy.

python3 mnist_cnn_simple.py

Multi Simple CNN training:

Train the networks several times with 108 types of parameters. Then output a csv file the accuracy in each epoch on the training model and test set. 

  python3 mnist_cnn_multi.py

Evaluate Mnist Test:

Evaluate the trained network on the first ten digits of the mnist dataset and visualize the results or evaluate on hand-written images

evaluate on mnist test set first 10 images:
    python3 mnist_evaluate.py ../models/mnist_cnn_simple.h5 0
evaluate on hand-written digits:
    python3 mnist_evaluate.py ../models/mnist_cnn_simple.h5 1 ../data/digits/

Layer analysis mnist:

examine the first layers of the mnist_cnn_simple network

python3 mnist_layer_analysis.py ../models/mnist_cnn_simple.h5

View mnist:

view images of digits in the mnist dataset

python3 mnist_view.py (default 10) or python3 mnist_view.py 5

CNN Gabor:

a network with a fixed 32 gabor filter first layer, a convolution layers with 32 3x3 filters, a max pooling layer with a 2x2 window, a dropout layer with a 0.25 dropout rate, a flatten layer, a dense layer with 128 nodes and relu activation, a second dropout layer with a 0.5 dropout rate, and a final dense layer for the output with 10 nodes and the softmax activation function. When compiling the model, use categorical cross-entropy as the loss function and adam as the optimizer. The metric should be accuracy.

python3 mnist_cnn_garbor.py

Greek MNIST embedding:

use mnist network as an embedding space to classify greek letters

 greek_mnist_embedding.py ../models/mnist_cnn_simple.h5 ../data/greek_training_data.csv ../data/greek_training_labels.csv ../data/greek_testing_data.csv ../data/greek_testing_labels.csv

Data processing for greek letters:

process greek data images to data+label

python3 greek_data_processing.py ../data/greek/ ../data/

Classifier:

A KNN classifier.  It can use raw intensity data to test the classifier. the classifier is used in greek_mnist_embedding.py

python3 classifiers.py ../data/greek_training_data.csv ../data/greek_training_labels.csv ../data/greek_testing_data.csv ../data/greek_testing_labels.csv



Tasks:

Task1: build and train a network for digit recognition with the MNIST.

We used Keras package.

These are the example digits from matplotlib.

 


Network Evaluation:

Both test and training model accuracy is increasing while training model has a more significant development. 

Test with the first 10 examples

The model appears good performance while the accuracy is 100%.

Handwriting test

In this task, we wrote some digits by hand and then test them with the trained model. The accuracy is 80% with 8 mistakenly categorized as 9 and 3 categorized as 5.

Task2: examine the network and analyze through layers

We analyzed the first couple of layers of the simple cnn we trained in task 1. First, we looked at the first convolutional layer. The filters are shown in the image below.

First layer filters:

To analyze the first layer filters, we applied them directly to the input by OpenCV filter2d function and looked at the output of the first layer of CNN. The images are shown in below. As expected, the patterns seem highly consistent between the two. The network first layer output is different from the direct filtering in that the background intensity. The difference is likely due to the bias vector implemented in default in the Keras conv2d layer.

Apply layer 0 filters to first training image by OpenCV filter2d (left) AND get the output of layer 0 from the model when passing in the first training image to the model.

Then, we looked at the output of the first layer, the first two layers, and the first three layers. The first two layers are two convolutional layers, while the third layer is a pooling layer. We looked at the output for four input examples.

From the pooling layer, we can already see some feature detection. From examining the output of the four sample inputs, we see, for example, in the first node (row 0, col 0) seems to detect northeast-southwest line segments and the second node (row 0, col 1) seems to detect northwest-southeast line segments.

The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing first training image.

The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing second training image.

The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing third training image.

The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing fourth training image.



Task3: embedding space for images of written symbols

 

We used the embedding space from the model trained in the previous task to recognize written letters. Following is the results of alpha, beta, and gamma training set ssd. Beta and Gemma classifications show convincing results while the result of classifying alpha is not as desired.

 


For the test set classification. We handwrote the three symbols in different scales and conditions:

 

Then we apply KNN on classifying the test set:

 


We use KNN classifier for the images before and after the model (28*28=784 vector). As shown in the pic to the left, the test set results have some significant error. But after processing through the model (1*128=128 vector embedding space), the test set classification result is 100% correct. This shows that the convolutional stack is very useful as an embedding space.

 

Task4

Four parameters:

I - Pooling layer size: 2*2 / 4*4/ 6*6 max pooling

J - Convolution filter size: 3*3/ 5*5/ 7*7

O - Convolution number of filters: 32,32 / 64, 32/ 32, 64/ 64, 64 

K - Dense nodes: 128 / 256/ 512

There are I*J*O*K = 108 types of options.
Running each epoch takes about half a minute. Running each model over 12 epochs takes 5 minutes. Overall it took about 12 hours to finish training and testing all those models. Some quick analysis of the data:
  
After sorting the training accuracy of the first epoch, we can observe the deterministic feature of the four parameters. Pooling Size and Dense Node have more weight, while num of filter and filter size varies a lot for high accurate models. Smaller pooling size and larger dense node make the model more accurate at the very first epoch. 

Taking into account of dense node number specifically:

The accuracy is overall increasing while the one with 512 dense nodes has the highest accuracy. Additionally, the accuracy drops in the end possibly for overfitting.

When we look at Pooling size:

The blue one stands for pooling size = 2; red is pooling size = 4; green is pooling size = 8

Overall, the most optimized combination according to the first epoch accuracy is: pooling size = 2; filter size = 2; number of filter = 64&64; dense node is 512
according to the final training accuracy, the most optimized combination is also the same. 
For average accuracy, the top three performances are all made up by pooling size = 2; the number of filter = 64&64; dense node is 512; while the number of filters varies from 3 to 7.

Extensions

1, Evaluate more dimensions (see task 4).

2, KNN classifier ( seen task 2)

4, In our new architecture, we use 5*5 Gabor filters (python3 mnist_cnn_garbor.py) as a fixed 1st layer, 3*3 as a second convolution layer with 2 paddings to keep consistent output as 24*24. 

Our Gabor model vs the previous model:

Gabor model accuracy vs previous model accuracy:

 

Because the first convolution layer is solid without back propagation, the Gabor filter model is slightly faster than the previous model while maintaining good accuracy.

 

Acknowledgement:

This is a group project by Mike Zheng and Heidi He.

Labels