Hello folks!
While introducing myself, I mentioned that I work on adding a Deep Reinforcement Learning (DQN) implementation to the library
ConvNetSharp. I've been facing one particular issue for days now: during training, the output values grow exponentially till they reach negative or positive infinity. For this particular issue I could need some fresh ideas or opinions, which could aid me in finding further opportunities for tracking down the cause of this issue.
So here is some information about the project itself. Afterwards (i.e. the next paragraph), I'll list the conducted steps for troubleshooting. For C#, there are not that many promising libraries out there for neural networks. I already worked with Encog, which is quite convenient, but it does not provide GPU support neither convolutional neural nets. The alternative, which I chose now, is
ConvNetSharp. The drawback of that library is the lack of documentation and in-code comments, but it supports CUDA (using
managed CUDA). An alternative would be to implement some interface between C# and Python, but I don't have any idea for such an approach, e.g. TCP most likely will turn out to be a bottleneck. The DQN implementation is adapted to
ConvNetJS's deepqlearn.js and a former
ConvNetSharp Port . For testing
my implementation, I created a
slot machine simulation using 3 reels, which are stopped individually by the agent. The agent receives as input the current 9 slots. The available actions are Wait and Stop Reel. A reward is handed out according to the score as soon as all reels are stopped. The best score is 1. If I use the old ConvNetSharp port with the DQN Demo, the action values (output values of the neural net) stay below 1. The same scenario on my implementation, using the most recent version of ConvNetSharp, faces the issue of exponentially growth during training.
Here is what I checked so far.
- Logged inputs, outputs, rewards, experiences (all, except the growing outputs, look fine)
- Used former DQN ConvNetSharp Demo for testing the slot machine simulation (well, the agent does not come up with a suitable solution, but the outputs do not explode)
- Varying hyperparameters, such as a very low learning rate or big and small training batches
There are two components, which are vague to me. The regression layer of ConvNetSharp got introduced recently and I'm not sure if I'm using the Volume (i.e. Tensor) object as intended by its author. As I'm not familiar with the actual implementation details of neural nets, I cannot figure out if the issue is caused by ConvNetSharp or not. I was in touch with the author of ConvNetSharp a few times, but still wasn't able to make progress on this issue. Parts of this issue are tracked on
Github.
It would be great if someone has some fresh ideas for getting new insights about the underlying issue.