QLearning/Experience replay with texas holdem (python)

  • 9 Replies
  • 4695 Views
*

jcbdev

  • Roomba
  • *
  • 5
QLearning/Experience replay with texas holdem (python)
« on: January 04, 2018, 08:51:59 pm »
Hello!

Just joined the forum.  Been working on a fun project over christmas and thought I'd drop by here to get some advice/feedback from some real pros  ;)

here is the project:
https://github.com/jcbdev/holdemq

The general concept is to build a bot (I'm thinking slack based at the mo) that constantly plays itself in the background.  Hopefully improving constantly.  Also I hope to allow real people to play games against it too over chat.  These real games will also get re-fed back into the background trainer too.

It's very early stages and I haven't really figured out what my network or hyperparameters should look like yet.  But the spine of the code is there.  Currently it will just start a table with 10 AI players and run that over and over again ad-infinitum

Would appreciate it if anyone has the time to have a look and offer any advice from an AI perspective what I could do. (or even just fun suggestions!)

*

ivan.moony

  • Trusty Member
  • ************
  • Bishop
  • *
  • 1729
    • mind-child
Re: QLearning/Experience replay with texas holdem (python)
« Reply #1 on: January 04, 2018, 09:00:27 pm »
Hi and welcome :)

May I ask, what is the starting set of knowledge? Do AI players have some part of predefined knowledge, or they just guessing the game rules as they play on?

*

jcbdev

  • Roomba
  • *
  • 5
Re: QLearning/Experience replay with texas holdem (python)
« Reply #2 on: January 04, 2018, 09:32:29 pm »
Hello!

No there is no starting set of knowledge.  The network takes in the current state of the board and outputs an action (check/fold/raise etc).

Over time it builds a "memory" of actions taken against a particular board state and the resultant rise or fall in the players stack/pot (as the score to measure performance).  QLearning uses this memory to replay previous experiences (in order) constantly to try and learn a prediction function that optimizes the chance of eventual success.  It's based on Google's Deepmind work that taught an AI to play atari 2600 game with no previous knowledge of the rules of the game.

At the start there is an epsilon parameter that controls the networks chances of taking a random action over the predicted network action.  At the beginning this is high (so it takes lots of random actions) but over time decays (the hope being that the network knows what its doing by this point!)


*

Zero

  • Eve
  • ***********
  • 1287
Re: QLearning/Experience replay with texas holdem (python)
« Reply #3 on: January 06, 2018, 11:13:27 am »
Welcome, jcbdev.

That's an interesting project! Have you considered participating to some existing Ai poker competition?

*

ranch vermin

  • Not much time left.
  • Terminator
  • *********
  • 947
  • Its nearly time!
Re: QLearning/Experience replay with texas holdem (python)
« Reply #4 on: January 06, 2018, 11:20:51 am »
Ive heard about qlearning,  does the q come from "quantum?" just as a catch phrase.   I also heard that q-learning came from this guy that made the dimitri hexapod here ->

are u using an existing library, or is it from scratch?   Im a from scratch developer myself, and I like really fast access networks(that possibly are a little lossy) because getting the amount of q-samples up could be the secret of tackling problems with larger sensor spaces.

I bet texas holdem poker would have a nice small fixed sensor space which would be good for a beginners implementation.

*

keghn

  • Trusty Member
  • *********
  • Terminator
  • *
  • 824
Re: QLearning/Experience replay with texas holdem (python)
« Reply #5 on: January 09, 2018, 04:26:27 pm »

Q Learning Explained: 


*

ivan.moony

  • Trusty Member
  • ************
  • Bishop
  • *
  • 1729
    • mind-child
Re: QLearning/Experience replay with texas holdem (python)
« Reply #6 on: January 09, 2018, 08:30:55 pm »
Maybe you should make a phone app out of it, call it "Win Machine" and go to Vegas to test it. And then to spend all the winnings on girls :D

*

jcbdev

  • Roomba
  • *
  • 5
Re: QLearning/Experience replay with texas holdem (python)
« Reply #7 on: February 07, 2018, 04:20:14 pm »
Do you think they'd get suspicious if I kept checking my phone after every move?

*

ivan.moony

  • Trusty Member
  • ************
  • Bishop
  • *
  • 1729
    • mind-child
Re: QLearning/Experience replay with texas holdem (python)
« Reply #8 on: February 07, 2018, 05:14:23 pm »
hehe, it would be fine if we had those tech glasses :)

*

AgentSmith

  • Bumblebee
  • **
  • 37
Re: QLearning/Experience replay with texas holdem (python)
« Reply #9 on: March 16, 2018, 03:32:03 am »
Seems that your approach is similar to https://en.wikipedia.org/wiki/TD-Gammon where you try to learn a value function by letting bots play against each other. The difference is that in TD-Gammon state-value functions are learned. You are learning a Q-function instead. Q-functions are more complex and harder to learn than state-value functions. But since texas holdem is not completely observable as backgammon, its seems to be a reasonable approach, since Q-functions do not rely on the probabilistic model of the successor state as state-value functions do.

 


LLaMA2 Meta's chatbot released
by spydaz (AI News )
August 24, 2024, 02:58:36 pm
ollama and llama3
by spydaz (AI News )
August 24, 2024, 02:55:13 pm
AI controlled F-16, for real!
by frankinstien (AI News )
June 15, 2024, 05:40:28 am
Open AI GPT-4o - audio, vision, text combined reasoning
by MikeB (AI News )
May 14, 2024, 05:46:48 am
OpenAI Speech-to-Speech Reasoning Demo
by MikeB (AI News )
March 31, 2024, 01:00:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am

Users Online

335 Guests, 0 Users

Most Online Today: 447. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles