QLearning/Experience replay with texas holdem (python)

jcbdev · « **on:** January 04, 2018, 08:51:59 pm »

Hello!

Just joined the forum. Been working on a fun project over christmas and thought I'd drop by here to get some advice/feedback from some real pros

here is the project:
https://github.com/jcbdev/holdemq

The general concept is to build a bot (I'm thinking slack based at the mo) that constantly plays itself in the background. Hopefully improving constantly. Also I hope to allow real people to play games against it too over chat. These real games will also get re-fed back into the background trainer too.

It's very early stages and I haven't really figured out what my network or hyperparameters should look like yet. But the spine of the code is there. Currently it will just start a table with 10 AI players and run that over and over again ad-infinitum

Would appreciate it if anyone has the time to have a look and offer any advice from an AI perspective what I could do. (or even just fun suggestions!)

ivan.moony · « **Reply #1 on:** January 04, 2018, 09:00:27 pm »

Hi and welcome

May I ask, what is the starting set of knowledge? Do AI players have some part of predefined knowledge, or they just guessing the game rules as they play on?

jcbdev · « **Reply #2 on:** January 04, 2018, 09:32:29 pm »

Hello!

No there is no starting set of knowledge. The network takes in the current state of the board and outputs an action (check/fold/raise etc).

Over time it builds a "memory" of actions taken against a particular board state and the resultant rise or fall in the players stack/pot (as the score to measure performance). QLearning uses this memory to replay previous experiences (in order) constantly to try and learn a prediction function that optimizes the chance of eventual success. It's based on Google's Deepmind work that taught an AI to play atari 2600 game with no previous knowledge of the rules of the game.

At the start there is an epsilon parameter that controls the networks chances of taking a random action over the predicted network action. At the beginning this is high (so it takes lots of random actions) but over time decays (the hope being that the network knows what its doing by this point!)

Zero · « **Reply #3 on:** January 06, 2018, 11:13:27 am »

Welcome, jcbdev.

That's an interesting project! Have you considered participating to some existing Ai poker competition?

ranch vermin · « **Reply #4 on:** January 06, 2018, 11:20:51 am »

Ive heard about qlearning, does the q come from "quantum?" just as a catch phrase. I also heard that q-learning came from this guy that made the dimitri hexapod here ->

are u using an existing library, or is it from scratch? Im a from scratch developer myself, and I like really fast access networks(that possibly are a little lossy) because getting the amount of q-samples up could be the secret of tackling problems with larger sensor spaces.

I bet texas holdem poker would have a nice small fixed sensor space which would be good for a beginners implementation.

keghn · « **Reply #5 on:** January 09, 2018, 04:26:27 pm »

Q Learning Explained:

ivan.moony · « **Reply #6 on:** January 09, 2018, 08:30:55 pm »

Maybe you should make a phone app out of it, call it "Win Machine" and go to Vegas to test it. And then to spend all the winnings on girls

jcbdev · « **Reply #7 on:** February 07, 2018, 04:20:14 pm »

Do you think they'd get suspicious if I kept checking my phone after every move?

ivan.moony · « **Reply #8 on:** February 07, 2018, 05:14:23 pm »

hehe, it would be fine if we had those tech glasses

AgentSmith · « **Reply #9 on:** March 16, 2018, 03:32:03 am »

Seems that your approach is similar to https://en.wikipedia.org/wiki/TD-Gammon where you try to learn a value function by letting bots play against each other. The difference is that in TD-Gammon state-value functions are learned. You are learning a Q-function instead. Q-functions are more complex and harder to learn than state-value functions. But since texas holdem is not completely observable as backgammon, its seems to be a reasonable approach, since Q-functions do not rely on the probabilistic model of the successor state as state-value functions do.

QLearning/Experience replay with texas holdem (python)

jcbdev

QLearning/Experience replay with texas holdem (python)

ivan.moony

Re: QLearning/Experience replay with texas holdem (python)

jcbdev

Re: QLearning/Experience replay with texas holdem (python)

Zero

Re: QLearning/Experience replay with texas holdem (python)

ranch vermin

Re: QLearning/Experience replay with texas holdem (python)

keghn

Re: QLearning/Experience replay with texas holdem (python)

ivan.moony

Re: QLearning/Experience replay with texas holdem (python)

jcbdev

Re: QLearning/Experience replay with texas holdem (python)

ivan.moony

Re: QLearning/Experience replay with texas holdem (python)

AgentSmith

Re: QLearning/Experience replay with texas holdem (python)

Recent Topics

Recent News

Users Online

Articles