Ideas/opinions for troubleshooting exploding output values (DQN)

Art · « **Reply #30 on:** August 25, 2017, 09:41:32 pm »

What if they are started individually then stopped all at once? One at a time?

How would those patterns be affected?

I still think random plays a big role here as the player has no control other perhaps than to start or stop the wheels (preferably without seeing the displays on the wheels).

Thoughts?

Marco · « **Reply #31 on:** August 25, 2017, 10:01:02 pm »

The simulation does not progress until the agent decided to stop or wait. The agent gets the currently visible items as inputs. So if the agent sees a seven on the first reel's slot, then he could decide to stop immediately.

keghn · « **Reply #32 on:** August 26, 2017, 02:29:52 am »

The goal is to land head up With A DQN.
A original coin like plane with two wing an ailerons and rudder and tail wing has a 99 percent chance of can find a way to land after 12 hours training.
A coin with on with under sized controls, 25 percent of full size, will have 70 percent chance of lading head up after 1000 hours of training.
The plane with all control wing with 10 percent of the original will take land 55 percent of the time heads up and will take 15 years to train.
A coin with no wing will have a 50 percent chance of landing heads up and will take for every to train.

Art · « **Reply #33 on:** August 26, 2017, 12:59:12 pm »

What about the coin (a nickle in this case) landing on its edge?
http://adsabs.harvard.edu/abs/1993PhRvE..48.2547M

Life always has its exceptions and oddities.

Marco · « **Reply #34 on:** August 26, 2017, 01:07:30 pm »

Here is an update concerning the implementation:

- A bug was fixed for computing the loss in the regression layer
- The cuda exception is related to the cuda context initialization, which might have to be done in the same thread
- The training data pair was composed maliciously

As of now, I cannot tell if this is leading to a breakthrough, because there is not much time right now to drive further tests and check for the cuda performance.

Marco · « **Reply #35 on:** September 07, 2017, 10:15:26 pm »

This is most likely the last update. The GPU issues got fixed, but it turns out that the GPU version runs much slower than the CPU one. Improving performance is not feasible for me due to the missing documentation and comments.

So my plan is now to completely focus on Python. By the end of July, Unity said that there will be an API to make use of Tensorflow and so on in a few weeks. So maybe the whole matter will become obsolete.

In the meantime, I started a repository on GitHub to provide some experimental environments for deep reinforcement learning.
https://github.com/MarcoMeter/AI-Learning-Environments

Marco · « **Reply #36 on:** October 25, 2017, 05:25:22 pm »

One month ago, Unity released its ML Agents.
This is what I started to work with.

keghn · « **Reply #37 on:** October 25, 2017, 07:49:04 pm »

https://github.com/Unity-Technologies/ml-agents

Ideas/opinions for troubleshooting exploding output values (DQN)

Art

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

Marco

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

keghn

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

Art

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

Marco

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

Marco

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

Marco

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

keghn

Re: Ideas/opinions for troubleshooting exploding output values (DQN)

Recent Topics

Recent News

Users Online

Articles