Hi guys, I'm new here (literally created this account 5 min ago) and I want to ask a technical question about AI. I'm not sure whether this forum is suitable for technical questions or not but here's it:
Context: So I have been doing AI for a long time now. I'm just curious about it and have no intention for now to apply AI in any fields. I have been testing different kinds of neural networks lately to see which works and which don't. I program all of these in Processing which is based on Java and I don't use any libraries like Tensorflow at all. Because of that, it'll be hard to give out the code as you'll need a lot of time to comprehend it so I'll just give the setup and the results I found here:
Network setup: Goal: recognize the MNIST handwritten digits (really popular dataset in AI).
Architecture: plain feed forward network
Dimension: 28*28+1 neurons (input layer. Last input always equal to 1 in all samples to replicate the bias), 50 neurons (hidden layer), 10 neurons (output layer).
Correct answer format: If 0 is presented, the first neuron in the output layer will be 1 and all the other 0. Vice versa for every other number.
Activation function: sigmoid
Learning constant: 0.04 (plain backpropergation).
Learning rule: Delta learning rule
Training samples: 1000
Batch size: 150 (last batch have size 100)
Testing samples: 100
Evaluation function used:Error: Sum over all samples(Sum over all output neurons( abs(prediction-answer) ))
Confidence: Sum over all samples(Sum over all output neurons( min(1-answer, answer) ))/number of samples. The more the network is confident that it is correct, the lower this value is.
Accuracy: Sum over all samples( kronecker delta(max(all output neurons), max(all correct answer neurons)) )/number of samples
Extension evaluation functions:Confidence: confidence on the first batch of the training samples
Real confidence: confidence on the testing samples
Accuracy: accuracy on the first batch of the training samples
Real accuracy: accuracy on the testing samples
These are the graphs I collected from running this:
This is a closer look at the begining:
Observations:- Accuracy and real accuracy graphs closely match each other in shape
- Confidence and real confidence graphs closely match each other in shape
- Accuracy keeps improving over time, but most of the time it's really quiet and unchanging and increases only at specific points (let's call them spikes)
- When a spike occurs, confidence increases dramatically, accuracy increases by a bit and error nudges a bit and slopes downward faster, eventually reaching a new, lower equilibrium.
Further experiments:- The spikes occur very randomly and you can't really tell one is coming up
- Spikes stop happening when you reach an accuracy of around 0.96 (96% correct)
- Sometimes the error drops sharply to 0 and rise back up to its nominal value just before a spike happens.
- When I run it with a learning constant of 0.02, the spikes don't appear at all and the error race towards 0.96 right away. A summary graph of it can be found here:
http://157239n.com/frame%203.png- At first, I suspect the spikes are due to the fact that the network's experiencing learning slowdown to the nature of the sigmoid activation function and the quadratic cost function. This can be dealt with using cross-entropy cost function but why don't the spikes appear if they're still stuck with learning slowdown when the learning constant is 0.02. I am working on implementing the cross-entropy cost function and see what happens but in the mean time, that's all the information I've got.
My question:So my question is, how can those spikes be formulated? What are those spikes anyway? What caused them? Can I somehow trigger a spike to happen so that the network can continue to learn?
Thanks in advanced if anyone knows about this. Let me know if you have any questions.