Ideas/opinions for troubleshooting exploding output values (DQN)

  • 37 Replies
  • 10723 Views
*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #15 on: August 23, 2017, 08:34:16 pm »
I think I'll keep an eye on ConvNetSharp. Progress relies on the author's contribution. So the plan would be to start focusing more on a low scale example which can be verified using Python. One of the OpenAI environments could be considered. After that I'll start prototyping an interface to Python to solve uncertainties. And maybe during some leisure time I could try to do some prototyping on neural nets with compute shaders in Unity.

So the priority is set on an appropriate example as integration test. Then I'd go into prototyping.

*

Zero

  • Eve
  • ***********
  • 1287
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #16 on: August 23, 2017, 09:24:32 pm »
Good to see you made a decision!

Should an "appropriate example" be related to some real-time activity with rewards and punishments?

*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #17 on: August 23, 2017, 09:55:36 pm »
The example just has to proof the functionality. Without that I'm not going to tackle any more complex scenarios. That's why I originally came up with the slot machine simulation.

*

Zero

  • Eve
  • ***********
  • 1287
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #18 on: August 24, 2017, 02:39:15 pm »
Why is the slot machine a bad test?

*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #19 on: August 24, 2017, 02:41:45 pm »
Because I didn't get a reasonable result yet. I'd have to try it out with python. All in all, the failure can be due to the slot machine setup or to the DQN implementation.
« Last Edit: August 24, 2017, 03:41:39 pm by Marco »

*

Korrelan

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1454
  • Look into my eyes! WOAH!
    • YouTube
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #20 on: August 24, 2017, 04:14:53 pm »
Hi Marco.

I’ve enjoyed following your progress but you are always going to have these kinds of problems using someone else’s libraries.

I believe it would be a better use of your time and effort to learn the underlying principles and write your own system from scratch. You are never going to achieve your desired results unless you have an in-depth comprehensive understanding of the problem space.

Also… perhaps I’m misunderstanding the slot machine idea but NN/ CNN’s are designed to find order/ repetition/ patterns in a complex system.  The nature of wheel selection on a ‘bandit’ is supposed to be totally random… are you hoping the CNN will find a logical pattern in a pseudo-random system?

 :)
It thunk... therefore it is!...    /    Project Page    /    KorrTecx Website

*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #21 on: August 24, 2017, 04:21:59 pm »
I believe it would be a better use of your time and effort to learn the underlying principles and write your own system from scratch. You are never going to achieve your desired results unless you have an in-depth comprehensive understanding of the problem space.

I partially agree on that. It's just you don't need to know the very details of the underlying implementation of a library. Otherwise tons of people would fail to make use of neural nets or whatever a library provides. Talking about ConvNetSharp, I don't feel save on using it, because of frequent issues. Well, thinking about to do something completely from scratch is always pretty exciting to me.

Concerning the slot machine, it is not like a regular single arm bandit. Probability is only present for picking the new items for the upper slots. I wrote done a more detailed paragraph on that here.

*

keghn

  • Trusty Member
  • *********
  • Terminator
  • *
  • 824
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #22 on: August 24, 2017, 05:41:26 pm »
Why is the slot machine a bad test?

 It is a coin flip. If the coin had wings ailerons and rudder on it so it that it landed on head all the time then DQN could control how to
land by trial and error.

If A DQN had poor control over, one aileron and wing, then it would find the statistical best policy.
If A DQN had no control then it could not learn the best police because the outcome is completely random.

 I am having a really hard time seeing flight control of this slot machine. If a winning police can be found for a Las Vagus slot
machine then it is rigged or broken or flawed.

*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #23 on: August 24, 2017, 05:46:05 pm »
For each tick (items of the slot machine are shifted downwards), the agent can decide to stop or wait. Just an if-condition could already get the best outcome by stopping the reels on a seven. There is no way that the agent misses out on a slot.

*

Zero

  • Eve
  • ***********
  • 1287
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #24 on: August 24, 2017, 08:21:19 pm »
I really don't get why everybody seems to misunderstand Marco's slot machine. There's no random involved, the nn has to recognize sequences, that's it. Or am I the one who don't understand?

Anyway, discarding the test because the results aren't as expected doesn't seem fair. Obviously, an already known and well established test would kill uncertainty, since you don't know where the problem comes from, test or agent.

About making it yourself, I'd go this way too, because nn aren't, in this case, an irrelevant tool, but the very thing you're working on. Libraries are good for irrelevant stuff, IMO.

*

keghn

  • Trusty Member
  • *********
  • Terminator
  • *
  • 824
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #25 on: August 24, 2017, 09:51:47 pm »
 I am totally unfamiliar with slot machine he using. Need more info on it. I am not going to spend all day look for it.
 It is very nice of him to think of us as all knowing super scientist. But i like my info explained to me like I am a 4th or 5th grader.

*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #26 on: August 24, 2017, 10:01:46 pm »
For this reason I posted a link to a more detailed description:

"The slot machine does not work like a single arm bandit, which outputs a result upon pulling the lever. It works like the slot machine in the games Pokemon and Digimon World, thus the user has to stop reel by reel. In more detail, the slot machine starts with all three reels spinning. For each tick of the main-loop, the items in the reels’ slots are updated (shifted down). The upper slot receives a new item, which is selected based on a custom probability distribution. When the user decides to stop a reel (otherwise wait), the first reel stops. The other reels are stopped accordingly. As soon as all reels are stopped, the slot machine concludes and is evaluated. If the items of the middle slot of each reel match, the slot machine will output a score."

*

Zero

  • Eve
  • ***********
  • 1287
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #27 on: August 24, 2017, 10:20:41 pm »
All reels are turning. Every time you click, one reel stops. When all are stopped, if there are 3 times the same thing, you get reward. Correct?

*

Marco

  • Bumblebee
  • **
  • 34
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #28 on: August 24, 2017, 10:25:20 pm »
All reels are turning. Every time you click, one reel stops. When all are stopped, if there are 3 times the same thing, you get reward. Correct?

The functionality is correct. Though I'm treating the rewards differently. But yeah having 3 times the same item is the ultimate goal.

Right now, stopping the first reel always signals a reward based on the item's value (e.g. +1 for a seven). After that, stopping the other two reels signal +1 for a match of the previous reels and -0.5 for failure.

*

Zero

  • Eve
  • ***********
  • 1287
Re: Ideas/opinions for troubleshooting exploding output values (DQN)
« Reply #29 on: August 25, 2017, 02:09:45 pm »
Ok. The test looks good to me, NN should be able to learn how to win.

 


Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
OpenAI Speech-to-Speech Reasoning Demo
by MikeB (AI News )
March 15, 2024, 08:14:02 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am
Nvidia Hype
by 8pla.net (AI News )
December 06, 2023, 10:04:52 pm
How will the OpenAI CEO being Fired affect ChatGPT?
by 8pla.net (AI News )
December 06, 2023, 09:54:25 pm
Independent AI sovereignties
by WriterOfMinds (AI News )
November 08, 2023, 04:51:21 am
LLaMA2 Meta's chatbot released
by 8pla.net (AI News )
October 18, 2023, 11:41:21 pm

Users Online

306 Guests, 0 Users

Most Online Today: 335. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles