QLearning/Experience replay with texas holdem (python)

  • 9 Replies
  • 1138 Views
*

jcbdev

  • Roomba
  • *
  • 5
QLearning/Experience replay with texas holdem (python)
« on: January 04, 2018, 08:51:59 pm »
Hello!

Just joined the forum.  Been working on a fun project over christmas and thought I'd drop by here to get some advice/feedback from some real pros  ;)

here is the project:
https://github.com/jcbdev/holdemq

The general concept is to build a bot (I'm thinking slack based at the mo) that constantly plays itself in the background.  Hopefully improving constantly.  Also I hope to allow real people to play games against it too over chat.  These real games will also get re-fed back into the background trainer too.

It's very early stages and I haven't really figured out what my network or hyperparameters should look like yet.  But the spine of the code is there.  Currently it will just start a table with 10 AI players and run that over and over again ad-infinitum

Would appreciate it if anyone has the time to have a look and offer any advice from an AI perspective what I could do. (or even just fun suggestions!)

*

ivan.moony

  • Trusty Member
  • **********
  • Millennium Man
  • *
  • 1039
    • Structured Type System
Re: QLearning/Experience replay with texas holdem (python)
« Reply #1 on: January 04, 2018, 09:00:27 pm »
Hi and welcome :)

May I ask, what is the starting set of knowledge? Do AI players have some part of predefined knowledge, or they just guessing the game rules as they play on?
Dream big. The bigger the dream is, the more beautiful place the world becomes.

*

jcbdev

  • Roomba
  • *
  • 5
Re: QLearning/Experience replay with texas holdem (python)
« Reply #2 on: January 04, 2018, 09:32:29 pm »
Hello!

No there is no starting set of knowledge.  The network takes in the current state of the board and outputs an action (check/fold/raise etc).

Over time it builds a "memory" of actions taken against a particular board state and the resultant rise or fall in the players stack/pot (as the score to measure performance).  QLearning uses this memory to replay previous experiences (in order) constantly to try and learn a prediction function that optimizes the chance of eventual success.  It's based on Google's Deepmind work that taught an AI to play atari 2600 game with no previous knowledge of the rules of the game.

At the start there is an epsilon parameter that controls the networks chances of taking a random action over the predicted network action.  At the beginning this is high (so it takes lots of random actions) but over time decays (the hope being that the network knows what its doing by this point!)


*

Zero

  • Trusty Member
  • ********
  • Replicant
  • *
  • 587
  • Not dead yet
    • Thinkbots are free
Re: QLearning/Experience replay with texas holdem (python)
« Reply #3 on: January 06, 2018, 11:13:27 am »
Welcome, jcbdev.

That's an interesting project! Have you considered participating to some existing Ai poker competition?
Thinkbots are free, as in 'free will'.

*

ranch vermin

  • Not much time left.
  • Replicant
  • ********
  • 715
  • Its nearly time!
Re: QLearning/Experience replay with texas holdem (python)
« Reply #4 on: January 06, 2018, 11:20:51 am »
Ive heard about qlearning,  does the q come from "quantum?" just as a catch phrase.   I also heard that q-learning came from this guy that made the dimitri hexapod here ->

are u using an existing library, or is it from scratch?   Im a from scratch developer myself, and I like really fast access networks(that possibly are a little lossy) because getting the amount of q-samples up could be the secret of tackling problems with larger sensor spaces.

I bet texas holdem poker would have a nice small fixed sensor space which would be good for a beginners implementation.

*

keghn

  • Trusty Member
  • *********
  • Terminator
  • *
  • 855
Re: QLearning/Experience replay with texas holdem (python)
« Reply #5 on: January 09, 2018, 04:26:27 pm »

Q Learning Explained: 


*

ivan.moony

  • Trusty Member
  • **********
  • Millennium Man
  • *
  • 1039
    • Structured Type System
Re: QLearning/Experience replay with texas holdem (python)
« Reply #6 on: January 09, 2018, 08:30:55 pm »
Maybe you should make a phone app out of it, call it "Win Machine" and go to Vegas to test it. And then to spend all the winnings on girls :D
Dream big. The bigger the dream is, the more beautiful place the world becomes.

*

jcbdev

  • Roomba
  • *
  • 5
Re: QLearning/Experience replay with texas holdem (python)
« Reply #7 on: February 07, 2018, 04:20:14 pm »
Do you think they'd get suspicious if I kept checking my phone after every move?

*

ivan.moony

  • Trusty Member
  • **********
  • Millennium Man
  • *
  • 1039
    • Structured Type System
Re: QLearning/Experience replay with texas holdem (python)
« Reply #8 on: February 07, 2018, 05:14:23 pm »
hehe, it would be fine if we had those tech glasses :)
Dream big. The bigger the dream is, the more beautiful place the world becomes.

*

AgentSmith

  • Bumblebee
  • **
  • 27
Re: QLearning/Experience replay with texas holdem (python)
« Reply #9 on: March 16, 2018, 03:32:03 am »
Seems that your approach is similar to https://en.wikipedia.org/wiki/TD-Gammon where you try to learn a value function by letting bots play against each other. The difference is that in TD-Gammon state-value functions are learned. You are learning a Q-function instead. Q-functions are more complex and harder to learn than state-value functions. But since texas holdem is not completely observable as backgammon, its seems to be a reasonable approach, since Q-functions do not rely on the probabilistic model of the successor state as state-value functions do.

 


another lang draft
by Zero (General Project Discussion)
Today at 06:10:42 pm
all universal joint robot
by LOCKSUIT (General Project Discussion)
Today at 06:09:19 pm
XKCD Comic : Update Notes
by Tyler (XKCD Comic)
Today at 12:00:02 pm
first.go: a lil' esolang
by Zero (General Project Discussion)
June 22, 2018, 10:55:28 pm
Can you guess the lyrics of this song?
by LOCKSUIT (General Chat)
June 22, 2018, 06:55:53 pm
Need help
by Don Patrick (General Chat)
June 22, 2018, 05:48:11 pm
I have a question
by LOCKSUIT (General Project Discussion)
June 22, 2018, 05:33:32 pm
TAU
by Art (AI in Film and Literature.)
June 21, 2018, 09:50:34 pm

Users Online

30 Guests, 1 User
Users active in past 15 minutes:
ivan.moony
[Trusty Member]

Most Online Today: 48. Most Online Ever: 208 (August 27, 2008, 09:36:30 am)

Articles