Ai Dreams Forum

Member's Experiments & Projects => AI Programming => Topic started by: Marco on August 08, 2017, 12:25:45 pm

Title: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 08, 2017, 12:25:45 pm
Hello folks!

While introducing myself (http://aidreams.co.uk/forum/index.php?topic=12414.msg48005#msg48005), I mentioned that I work on adding a Deep Reinforcement Learning (DQN) implementation to the library ConvNetSharp (https://github.com/cbovar/ConvNetSharp). I've been facing one particular issue for days now: during training, the output values grow exponentially till they reach negative or positive infinity. For this particular issue I could need some fresh ideas or opinions, which could aid me in finding further opportunities for tracking down the cause of this issue.

So here is some information about the project itself. Afterwards (i.e. the next paragraph), I'll list the conducted steps for troubleshooting. For C#, there are not that many promising libraries out there for neural networks. I already worked with Encog, which is quite convenient, but it does not provide GPU support neither convolutional neural nets. The alternative, which I chose now, is ConvNetSharp (https://github.com/cbovar/ConvNetSharp). The drawback of that library is the lack of documentation and in-code comments, but it supports CUDA (using managed CUDA (https://github.com/kunzmi/managedCuda)). An alternative would be to implement some interface between C# and Python, but I don't have any idea for such an approach, e.g. TCP most likely will turn out to be a bottleneck. The DQN implementation is adapted to ConvNetJS's deepqlearn.js (https://github.com/karpathy/convnetjs/blob/master/build/deepqlearn.js) and a former ConvNetSharp Port (https://github.com/dubezOniner/Deep-QLearning-Demo-csharp/blob/master/DeepQLearning/DRLAgent/DeepQLearn.cs) . For testing my implementation (https://github.com/MarcoMeter/ConvNetSharp/tree/DQN/src/ConvNetSharp.ReinforcementLearning.Deep), I created a slot machine simulation (https://github.com/cbovar/ConvNetSharp/files/1172337/SlotMachine.zip) using 3 reels, which are stopped individually by the agent. The agent receives as input the current 9 slots. The available actions are Wait and Stop Reel. A reward is handed out according to the score as soon as all reels are stopped. The best score is 1. If I use the old ConvNetSharp port with the DQN Demo, the action values (output values of the neural net) stay below 1. The same scenario on my implementation, using the most recent version of ConvNetSharp, faces the issue of exponentially growth during training.

Here is what I checked so far.

There are two components, which are vague to me. The regression layer of ConvNetSharp got introduced recently and I'm not sure if I'm using the Volume (i.e. Tensor) object as intended by its author. As I'm not familiar with the actual implementation details of neural nets, I cannot figure out if the issue is caused by ConvNetSharp or not. I was in touch with the author of ConvNetSharp a few times, but still wasn't able to make progress on this issue. Parts of this issue are tracked on Github (https://github.com/cbovar/ConvNetSharp/issues/55#issuecomment-319089086).


It would be great if someone has some fresh ideas for getting new insights about the underlying issue.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 08, 2017, 03:37:53 pm
 Exploding values? Do your neurons us a squashing function?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 08, 2017, 05:50:43 pm
I'm not sure if this is desired for regression. To my understanding of the ConvNetSharp Code, there is no squashing function during the training process on the regression layers. Regardless, the values are expected to be in the range 0 to maybe 2. Growing beyond that range for the slot machine is just wrong.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Korrelan on August 08, 2017, 07:22:51 pm
I'm not familiar with this setup but I would check the Sigmoid function in the new version of Convnetsharp.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 08, 2017, 07:46:16 pm
 Tanh will keep the output between -1 and 1. Which is about the same thing: 

https://www.google.com/imgres?imgurl=http://mathworld.wolfram.com/images/interactive/TanhReal.gif&imgrefurl=http://mathworld.wolfram.com/HyperbolicTangent.html&h=233&w=360&tbnid=L1oXlQOCweP7rM:&tbnh=136&tbnw=211&usg=__uFQAtr9nqUDIA8HsgLc6M6LPlSU=&vet=1&docid=W4RxENI2SKGxAM&client=ubuntu&sa=X&ved=0ahUKEwj-3pSBnsjVAhXCzVQKHc_BDtgQ9QEILDAA


Could be  Sigmoid or Relu function times two?:

https://www.youtube.com/watch?v=-7scQpJT7uo&t=2s


 If early in a deep Neural net work the signal coming out of a neuron get pined to maximum or minimum, then the
 information is lost to the following layer. If the signal bounce in and out of max output or minimal amplification and spends 50 percent
of the time there then the information is only there for the following layer 50 percent of the time. Or if signal to begin with
is not need at all. But is pulled within detection and the following are using it then that would not be good? 

 So i guess you would be normalizing your data between 0 and 2 or -1 and 1?




Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 08, 2017, 07:51:42 pm
This is how the neural net is setup:

Input Layer (9 nodes)
Hidden Layer, ReLU Activation (10 nodes)
Output Layer, Regression (2 nodes)

The Sigmoid activation function is not used. The loss of the regression layer is computed similar to this (this is the ConvNetJs version (https://github.com/karpathy/convnetjs/blob/5c653cb15a53c3361a6cf21c7277de941ff05352/src/convnet_layers_loss.js#L117)):

var i = y.dim;
        var yi = y.val;
        var dy = x.w - yi;
        x.dw = dy;
loss += 0.5*dy*dy;

I could use Softmax for the last layer, but that would be for classification and not regression. The inputs are normalized.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 08, 2017, 08:08:10 pm
 Slot machine is the worst thing for this example. What "F" is you input data? The input form the last spin? Or data logged from the
past hundred?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 08, 2017, 08:24:46 pm
Single arm and contextual bandits have been used before for reinforcement learning tasks.

About the slot machine:
Each reel has 3 slots. Each slot is occupied by an item (Peach, Cherry, ..., Seven). AS sopon as the slot machine starts, each reel starts spinning (pulling new items from a custom probability distribution). The action to stop the reel stops the left reel, the second stop stops the middle real and of course the last stop event stops the right reel. So this is not a typical single armed bandit where all reels stop at the same time or automatically. It works like seen in the games Digimon World or Pokemon:
https://www.youtube.com/watch?v=6JbHfkEdokE
Each tick of the main loop updates each reel. Decisions are to be made on each tick (waiting or stopping a reel).

Maybe the slot machine is not the right fit for Deep Reinforcement Learning, but it already reveals issues of the DQN implementation. The old implementation does not suffer from that high value issue, but the newer one does. I could implement the poison and apples example (http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html) of ConvNetJS for additional tests.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 14, 2017, 09:33:34 am
Here is a small update:

The author of ConvNetSharp fixed a major bug of the regression layer. So now the outputs do not grow exponentially anymore.

The next step is to keep testing the implementation. Concerning the slot machine example, I didn't achieve a reasonable result, yet. I'm going to try out different reward signals soon. For the Apples&Poison Demo, the performance lacks severely as soon as the training starts.

Does anybody know of a good demo for verification. The goal is to successfully train some model within minutes.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 14, 2017, 02:43:49 pm
 So the operator of this slot machine can stop A spinning wheel or reel when it see the desired symbols are showing?
Then can dot the same with the following two others spinning reels? Or when the first is stopped then the other  reals stop in a chain reaction? What about order of selection of wheel? What about time between selection
of wheels?   
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 14, 2017, 03:09:29 pm
With the start of the slot machine, all 3 reels start spinning. The available actions are StopReel and Wait. StopReel stops the first reel, while the other two reels keep spinning. Triggering again StopReel stops the second and thus the third one is still spinning. And of course the last reel is stopped again by executing StopReel.

It's not a typical single arm bandit which just works on a single probability. So the agent looks at the state of the whole slot machine, it can observe all the slots items and can then decide to to stop one reel. After executing one action (either wait or stop), the slot machine is getting updated. So the spinning reels shift down the slots' items. So for each change of the slot, the agent is asked to stop or wait.


For this example I came up with new reward signals. Before, the reward was based on the total outcome - meaning the achieved score after stopping all reels.
Now, each reel provides a reward or punishment. The first one rewards the agent based on the item (e.g. a 7 is worth 1 and a cherry is worth 0.01). The second reel is all about if the agent managed to score a match or not. Scoring a match grants a reward of one and if it doesn't match, the agent is punished by -0.5. The second and the third reel behave the same concerning the reward.


Edit:
One more info: The slot machine has 6^9 (10 Mio) states
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 22, 2017, 02:07:45 pm
Progress is still sparse. I still didn't get a good result for the slot machine so I switched to the Apples&Poison Demo. Well, this revealed a huge lack in performance, because ConvNetSharp seems to be single threaded (no imports for using threads). The old port of ConvNetJS, which comes with the Apples&Poison Demo is running much faster, but is single threaded as well.

As ConvNetSharp features CUDA, I wanted to overcome this problem for now by letting the GPU do the job, but from that point on I'm pushed from one exception to another one. The first one was about setting the project to 64bit only and now I'm stuck on a CUDA exception concerning memory allocation.

ConvNetSharp does not come with any documentation or some comments in the code. What do you think? Should I stick with ConvNetSharp? The alternative would be to build an interface to Python or to implement neural networks myself. Of course the implementation takes much more effort, but has educational advantages. And I'd approach it using compute shaders in Unity that wouldn't limit the usage to nvidia GPUs.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 22, 2017, 02:36:18 pm
Hi,
You did the pros & cons of implementing NN yourself, which I understand. But what are the pros of sticking with ConvNetSharp? (I mean, since it doesn't work nicely)
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 22, 2017, 02:57:16 pm
(https://aidreams.co.uk/forum/proxy.php?request=http%3A%2F%2Fwww.devbeyond.de%2Fexternal%2Ftable.png&hash=1920e235a8295b008a5a9ae86a5a40d458512934)
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 23, 2017, 08:27:28 pm
The "GPU exceptions" cons is shadowing two pros out of four, isn't it?

Also, don't you think the educational benefits of DIY are heavy weight in the balance?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 23, 2017, 08:34:16 pm
I think I'll keep an eye on ConvNetSharp. Progress relies on the author's contribution. So the plan would be to start focusing more on a low scale example which can be verified using Python. One of the OpenAI environments could be considered. After that I'll start prototyping an interface to Python to solve uncertainties. And maybe during some leisure time I could try to do some prototyping on neural nets with compute shaders in Unity.

So the priority is set on an appropriate example as integration test. Then I'd go into prototyping.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 23, 2017, 09:24:32 pm
Good to see you made a decision!

Should an "appropriate example" be related to some real-time activity with rewards and punishments?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 23, 2017, 09:55:36 pm
The example just has to proof the functionality. Without that I'm not going to tackle any more complex scenarios. That's why I originally came up with the slot machine simulation.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 24, 2017, 02:39:15 pm
Why is the slot machine a bad test?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 24, 2017, 02:41:45 pm
Because I didn't get a reasonable result yet. I'd have to try it out with python. All in all, the failure can be due to the slot machine setup or to the DQN implementation.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Korrelan on August 24, 2017, 04:14:53 pm
Hi Marco.

I’ve enjoyed following your progress but you are always going to have these kinds of problems using someone else’s libraries.

I believe it would be a better use of your time and effort to learn the underlying principles and write your own system from scratch. You are never going to achieve your desired results unless you have an in-depth comprehensive understanding of the problem space.

Also… perhaps I’m misunderstanding the slot machine idea but NN/ CNN’s are designed to find order/ repetition/ patterns in a complex system.  The nature of wheel selection on a ‘bandit’ is supposed to be totally random… are you hoping the CNN will find a logical pattern in a pseudo-random system?

 :)
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 24, 2017, 04:21:59 pm
I believe it would be a better use of your time and effort to learn the underlying principles and write your own system from scratch. You are never going to achieve your desired results unless you have an in-depth comprehensive understanding of the problem space.

I partially agree on that. It's just you don't need to know the very details of the underlying implementation of a library. Otherwise tons of people would fail to make use of neural nets or whatever a library provides. Talking about ConvNetSharp, I don't feel save on using it, because of frequent issues. Well, thinking about to do something completely from scratch is always pretty exciting to me.

Concerning the slot machine, it is not like a regular single arm bandit. Probability is only present for picking the new items for the upper slots. I wrote done a more detailed paragraph on that here (https://docs.google.com/document/d/1lBXG0lwoIBkPUeeyNPrhvXC8sCsY2FX81BCOa5atNA0/edit?usp=sharing).
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 24, 2017, 05:41:26 pm
Why is the slot machine a bad test?

 It is a coin flip. If the coin had wings ailerons and rudder on it so it that it landed on head all the time then DQN could control how to
land by trial and error.

If A DQN had poor control over, one aileron and wing, then it would find the statistical best policy.
If A DQN had no control then it could not learn the best police because the outcome is completely random.

 I am having a really hard time seeing flight control of this slot machine. If a winning police can be found for a Las Vagus slot
machine then it is rigged or broken or flawed.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 24, 2017, 05:46:05 pm
For each tick (items of the slot machine are shifted downwards), the agent can decide to stop or wait. Just an if-condition could already get the best outcome by stopping the reels on a seven. There is no way that the agent misses out on a slot.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 24, 2017, 08:21:19 pm
I really don't get why everybody seems to misunderstand Marco's slot machine. There's no random involved, the nn has to recognize sequences, that's it. Or am I the one who don't understand?

Anyway, discarding the test because the results aren't as expected doesn't seem fair. Obviously, an already known and well established test would kill uncertainty, since you don't know where the problem comes from, test or agent.

About making it yourself, I'd go this way too, because nn aren't, in this case, an irrelevant tool, but the very thing you're working on. Libraries are good for irrelevant stuff, IMO.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 24, 2017, 09:51:47 pm
 I am totally unfamiliar with slot machine he using. Need more info on it. I am not going to spend all day look for it.
 It is very nice of him to think of us as all knowing super scientist. But i like my info explained to me like I am a 4th or 5th grader.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 24, 2017, 10:01:46 pm
For this reason I posted a link to a more detailed description: (https://docs.google.com/document/d/1lBXG0lwoIBkPUeeyNPrhvXC8sCsY2FX81BCOa5atNA0/edit?usp=sharing)

"The slot machine does not work like a single arm bandit, which outputs a result upon pulling the lever. It works like the slot machine in the games Pokemon and Digimon World, thus the user has to stop reel by reel. In more detail, the slot machine starts with all three reels spinning. For each tick of the main-loop, the items in the reels’ slots are updated (shifted down). The upper slot receives a new item, which is selected based on a custom probability distribution. When the user decides to stop a reel (otherwise wait), the first reel stops. The other reels are stopped accordingly. As soon as all reels are stopped, the slot machine concludes and is evaluated. If the items of the middle slot of each reel match, the slot machine will output a score."
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 24, 2017, 10:20:41 pm
All reels are turning. Every time you click, one reel stops. When all are stopped, if there are 3 times the same thing, you get reward. Correct?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 24, 2017, 10:25:20 pm
All reels are turning. Every time you click, one reel stops. When all are stopped, if there are 3 times the same thing, you get reward. Correct?

The functionality is correct. Though I'm treating the rewards differently. But yeah having 3 times the same item is the ultimate goal.

Right now, stopping the first reel always signals a reward based on the item's value (e.g. +1 for a seven). After that, stopping the other two reels signal +1 for a match of the previous reels and -0.5 for failure.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Zero on August 25, 2017, 02:09:45 pm
Ok. The test looks good to me, NN should be able to learn how to win.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Art on August 25, 2017, 09:41:32 pm
What if they are started individually then stopped all at once? One at a time?

How would those patterns be affected?

I still think random plays a big role here as the player has no control other perhaps than to start or stop the wheels (preferably without seeing the displays on the wheels).

Thoughts?
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 25, 2017, 10:01:02 pm
The simulation does not progress until the agent decided to stop or wait. The agent gets the currently visible items as inputs. So if the agent sees a seven on the first reel's slot, then he could decide to stop immediately.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on August 26, 2017, 02:29:52 am
The goal is to land head up With A DQN. 
 A original coin like plane with two wing an ailerons and rudder and tail wing has a 99 percent chance of can find a way to land after 12 hours training.   
 A coin with on with under sized controls, 25 percent of full size, will have 70 percent chance of lading head up after 1000 hours of training. 
 The plane with all control wing with 10 percent of the original will take land 55 percent of the time heads up and will take 15 years to train. 
 A coin with no wing will have a 50 percent chance of landing heads up and will take for every to train. 
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Art on August 26, 2017, 12:59:12 pm
What about the coin (a nickle in this case) landing on its edge?
http://adsabs.harvard.edu/abs/1993PhRvE..48.2547M (http://adsabs.harvard.edu/abs/1993PhRvE..48.2547M)

Life always has its exceptions and oddities.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on August 26, 2017, 01:07:30 pm
Here is an update concerning the implementation:

- A bug was fixed for computing the loss in the regression layer
- The cuda exception is related to the cuda context initialization, which might have to be done in the same thread
- The training data pair was composed maliciously

As of now, I cannot tell if this is leading to a breakthrough, because there is not much time right now to drive further tests and check for the cuda performance.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on September 07, 2017, 10:15:26 pm
This is most likely the last update. The GPU issues got fixed, but it turns out that the GPU version runs much slower than the CPU one. Improving performance is not feasible for me due to the missing documentation and comments.

So my plan is now to completely focus on Python. By the end of July, Unity said that there will be an API to make use of Tensorflow and so on in a few weeks. So maybe the whole matter will become obsolete.

In the meantime, I started a repository on GitHub to provide some experimental environments for deep reinforcement learning.
https://github.com/MarcoMeter/AI-Learning-Environments
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: Marco on October 25, 2017, 05:25:22 pm
One month ago, Unity released its ML Agents (http://"https://github.com/Unity-Technologies/ml-agents").
This is what I started to work with.
Title: Re: Ideas/opinions for troubleshooting exploding output values (DQN)
Post by: keghn on October 25, 2017, 07:49:04 pm


https://github.com/Unity-Technologies/ml-agents