Simple CNN training:
a network with two convolution layers with 32 3x3 filters, a max pooling layer with a 2x2 window, a dropout layer with a 0.25 dropout rate, a flatten layer, a dense layer with 128 nodes and relu activation, a second dropout layer with a 0.5 dropout rate, and a final dense layer for the output with 10 nodes and the softmax activation function. When compiling the model, use categorical cross-entropy as the loss function and adam as the optimizer. The metric should be accuracy.
Multi Simple CNN training:
Train the networks several times with 108 types of parameters. Then output a csv file the accuracy in each epoch on the training model and test set.
Evaluate Mnist Test:
Evaluate the trained network on the first ten digits of the mnist dataset and visualize the results or evaluate on hand-written images
evaluate on mnist test set first 10 images:
python3 mnist_evaluate.py ../models/mnist_cnn_simple.h5 0
evaluate on hand-written digits:
python3 mnist_evaluate.py ../models/mnist_cnn_simple.h5 1 ../data/digits/
Layer analysis mnist:
examine the first layers of the mnist_cnn_simple network
python3 mnist_layer_analysis.py ../models/mnist_cnn_simple.h5
view images of digits in the mnist dataset
python3 mnist_view.py (default 10) or python3 mnist_view.py 5
a network with a fixed 32 gabor filter first layer, a convolution layers with 32 3x3 filters, a max pooling layer with a 2x2 window, a dropout layer with a 0.25 dropout rate, a flatten layer, a dense layer with 128 nodes and relu activation, a second dropout layer with a 0.5 dropout rate, and a final dense layer for the output with 10 nodes and the softmax activation function. When compiling the model, use categorical cross-entropy as the loss function and adam as the optimizer. The metric should be accuracy.
Greek MNIST embedding:
use mnist network as an embedding space to classify greek letters
greek_mnist_embedding.py ../models/mnist_cnn_simple.h5 ../data/greek_training_data.csv ../data/greek_training_labels.csv ../data/greek_testing_data.csv ../data/greek_testing_labels.csv
Data processing for greek letters:
process greek data images to data+label
python3 greek_data_processing.py ../data/greek/ ../data/
A KNN classifier. It can use raw intensity data to test the classifier. the classifier is used in greek_mnist_embedding.py
python3 classifiers.py ../data/greek_training_data.csv ../data/greek_training_labels.csv ../data/greek_testing_data.csv ../data/greek_testing_labels.csv
Task1: build and train a network for digit recognition with the MNIST.
We used Keras package.
These are the example digits from matplotlib.
Both test and training model accuracy is increasing while training model has a more significant development.
Test with the first 10 examples
The model appears good performance while the accuracy is 100%.
In this task, we wrote some digits by hand and then test them with the trained model. The accuracy is 80% with 8 mistakenly categorized as 9 and 3 categorized as 5.
Task2: examine the network and analyze through layers
We analyzed the first couple of layers of the simple cnn we trained in task 1. First, we looked at the first convolutional layer. The filters are shown in the image below.
First layer filters:
To analyze the first layer filters, we applied them directly to the input by OpenCV filter2d function and looked at the output of the first layer of CNN. The images are shown in below. As expected, the patterns seem highly consistent between the two. The network first layer output is different from the direct filtering in that the background intensity. The difference is likely due to the bias vector implemented in default in the Keras conv2d layer.
Apply layer 0 filters to first training image by OpenCV filter2d (left) AND get the output of layer 0 from the model when passing in the first training image to the model.
Then, we looked at the output of the first layer, the first two layers, and the first three layers. The first two layers are two convolutional layers, while the third layer is a pooling layer. We looked at the output for four input examples.
From the pooling layer, we can already see some feature detection. From examining the output of the four sample inputs, we see, for example, in the first node (row 0, col 0) seems to detect northeast-southwest line segments and the second node (row 0, col 1) seems to detect northwest-southeast line segments.
The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing first training image.
The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing second training image.
The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing third training image.
The output of first layer, first (conv) + second (conv) layer, first (conv) + second (conv) + third (pooling) layer when passing fourth training image.
Task3: embedding space for images of written symbols
We used the embedding space from the model trained in the previous task to recognize written letters. Following is the results of alpha, beta, and gamma training set ssd. Beta and Gemma classifications show convincing results while the result of classifying alpha is not as desired.
For the test set classification. We handwrote the three symbols in different scales and conditions:
Then we apply KNN on classifying the test set:
We use KNN classifier for the images before and after the model (28*28=784 vector). As shown in the pic to the left, the test set results have some significant error. But after processing through the model (1*128=128 vector embedding space), the test set classification result is 100% correct. This shows that the convolutional stack is very useful as an embedding space.
I - Pooling layer size: 2*2 / 4*4/ 6*6 max pooling
J - Convolution filter size: 3*3/ 5*5/ 7*7
O - Convolution number of filters: 32,32 / 64, 32/ 32, 64/ 64, 64
K - Dense nodes: 128 / 256/ 512
Taking into account of dense node number specifically:
When we look at Pooling size:
The blue one stands for pooling size = 2; red is pooling size = 4; green is pooling size = 8
1, Evaluate more dimensions (see task 4).
2, KNN classifier ( seen task 2)
4, In our new architecture, we use 5*5 Gabor filters (python3 mnist_cnn_garbor.py) as a fixed 1st layer, 3*3 as a second convolution layer with 2 paddings to keep consistent output as 24*24.
Our Gabor model vs the previous model:
Gabor model accuracy vs previous model accuracy:
Because the first convolution layer is solid without back propagation, the Gabor filter model is slightly faster than the previous model while maintaining good accuracy.
This is a group project by Mike Zheng and Heidi He.