This Superhuman Poker AI Was Trained in 20 Hours


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Today, the game we’ll be talking about is
the six-player no-limit Hold’em poker, which is one of the more popular poker variants
out there. And the goal of this project was to build
a poker AI that never played against a human before and learns entirely through self-play,
and is able to defeat professional human players. During these tests, two of the players that
were tested against are former World Series of Poker Main Event winners. And of course, before you ask, yes, in a moment,
we’ll look at an example hand that shows how the AI traps a human player. Poker is very difficult to learn for AI bots
because it is a game of imperfect information. For instance, chess is a game of perfect information
where we see all the pieces and can make a good decision if we analyze the situation
well. However, not so much in Poker, because only
at the very end of the hand do the players show what they have. This makes it extremely difficult to train
an AI to do well. And now, let’s have a look at the promised
example hand here. We talked about imperfect information just
a moment ago, so I’ll note that all the cards are shown face up for us to make the
analysis of this hand easier, of course, this is not how the hands were played. You see the AI up there marked with P2 sittin’
pretty with a Jack and a Queen, and before the flop happens, which is when the first
three cards are revealed, only one human player seems to be interested in this hand. During the flop, the AI paired its Queen and
has a Jack as a kicker, which, if played well is going to be disastrous for the human player. So, why is that? You see, the human player also paired their
queen, but has a weaker kicker and will therefore lose to the AIs hand. In this case, this player thinks they have
a strong hand and will get lots of value out of it… only to find out that they will be
the one milked by the AI. So, how exactly does that happen? Well, look here carefully! The bot shows weakness by checking here, to
which, the human player’s answer is a small raise. The bot, again, shows weakness by just calling
this raise, and checking again on the turn, essentially saying “I am weak, don’t hurt
me!”. By the time we get to the river, the AI, again,
appears weak to the human player, who now tries to milk the bot with a mid-sized raise…
and, the AI recognizes that now is the time to pounce, the confused player calls the bet
and gets milked for almost all their money. An excellent slow play from the AI. Now, note that one hand is difficult to evaluate
in isolation, this was a great hand indeed, but we need to look at entire games to get
a better grasp of the capabilities of this AI. So if we look at the dollar-equivalent value
of the chips in the game, the AI was able to win a thousand dollars from these 5 professional
poker players…every hour. It also uses very little resources, can be
trained in the cloud for only several hundred dollars, and exceeds human-level performance
within only 20 hours. What you see here is a decision tree that
explains how the algorithm figures out whether to check or bet, and as you see here, this
tree is traversed in a depth-first way, so first, it descends deep into one possible
decision, and later, as more options are being unrolled and evaluated, the probability of
these choices are updated above. In simpler words, first, the AI seems somewhat
sure that checking would be a good choice here, but after carefully evaluating both
decisions, it is able to further reinforce this choice. One of the professional players noted that
the bot is a much more efficient bluffer than a human and always puts on a lot of pressure. Now note that this is also a general learning
technique and is not tailored specifically for poker, and as a result, the authors of
the paper noted that they will also try it on other imperfect information games in the
future. What a time to be alive! This episode has been supported by Weights
& Biases. Weights & Biases provides tools to track your
experiments in your deep learning projects. It can save you a ton of time and money in
these projects and is being used by OpenAI, Toyota Research, Stanford and Berkeley. It is really easy to use, in fact, this blog
post describes how you can visualize your Keras models with only one line of code. When you run this model, it will also start
saving relevant metrics for you and here you can see the visualization of the mentioned
model and these metrics as well. That’s it. You’re done! It can do a lot more than this, of course,
and, you know what the best part is? The best part is that it’s free and will
always be free for academics and open source projects. Make sure to visit them through wandb.com/papers
or just click the link in the video description and sign up for a free demo today. Our thanks to Weights & Biases for helping
us make better videos for you. Thanks for watching and for your generous
support, and I’ll see you next time!

Posts created 5600

100 thoughts on “This Superhuman Poker AI Was Trained in 20 Hours

  1. I'd be fun to try Yu-Gi-Oh and also train the AI to build a deck with varying degrees of information on its opponent's deck.

  2. I wonder if someone dares to do "the forbidden" and put to the test different types of political structures to classify them from best to worst using neural networks.

  3. Is that 20 hours in wall-clock time or 20 hours in overall training time (i.e. 4 cores in parallel for 5 hours wall clock time)? If the latter, very impressive.

  4. It's an imperfect game, but it can be "perfectly" simulated. Very rare in real-world applications (i.e. outside of card games, video games, etc). 5/10.

  5. Inequality gets biger every day, the western culture slowly dies, the clintons kill in jail the satanic pederast celebrity pimp and the planet burns in flames But wait Robo-poker 2000 learns to fuck world champions in less than a day… what a time to live.

  6. The hand that was shown can not be evaluated out of context. If a beginner showed me how he had won this hand and asked me for an evaluation, I would tell him that he overplayed his cards and was lucky to get called by pretty much the only worse hand that might pay him off. However, now that we know that the hand was played by a strong bot, that has established a table image of being a frequent bluffer, we can reevaluate the river check/raise as a brilliant merging of ranges. Knowing that it is capable of doing this, the humans can no longer consider a river check/raise just a very polarised bet that either means a total bluff or a super strong hand, like three of a kind, but now they also have to consider that it may be a medium strength hand, where they have to decide if their kicker is good enough.

    This is scary. I am happy that I stopped playing online poker for money a long time ago. This is the official death blow to online poker, since it is now impossible to know if you are playing against a complete noob or a superhuman AI before you have lost so many hands that it is clear that it is not some random luck box.

  7. I think a computer can easily be better at poker than real human players, because a big part of poker is exploiting human psychology and recognizing behavioural patterns. A computer is very unpredictable.

  8. Professional online poker players know how to play well against human opponents, since this is what they are always doing. Although the results shown in this video are nice, we don't know how difficult it would be for a human to learn to play against a bot like this, suppose that the human player would deliberately train against bots like this one.

  9. "What a time to be alive!" in the sense your uncle will have to quit online poker and will stop taking loans from Mom.

  10. it doesn't use any Neural Networks, right ? it use 64-core processor and 512GB of RAM. It should be optimized with NN

  11. "what a time to be alive" yeah, lets suck fun out of everything with computers. what a time i have lived, i drove my own car, call real people on the phone, had a surgeon do my surgery, told hello to a bus driver…. im really looking forward to loose all of these little things because big companies make more money by putting AI into everything. dehumanizing the world.

  12. Great work.

    Paper:
    https://drive.google.com/file/d/1AaxZql7wio_R5vUKY_07QTYRq4-f9WIW/view?usp=sharing

    Edit: Didn't see it in the first link so I got it myself …then noticed the direct second link. 😉

  13. Wow, that example hand was nuts. Mainly because any good poker player knows that you generally go aggressive there and raise. Not raising until the end is thought to be super risky. Generally, you just want to win your pot ASAP. Very interesting stuff!

  14. This is where science and real-life do not go well together.
    The AI was actually down in chips after playing 10,000 hands. But, after adjusting with variance-reduction (AIVAT) it was up in chips.

    It doesn't change the fact that it lost money.

  15. Wasnt poker nicht "cheat ridden" anyway? Like, there MUST HAD BEEN assistance in calculating the chances. And…. thats it, at the end of the day. Play safe, play long hours and you will come out in the positive.

  16. I don't like your explaination for the A.I play… she can't know that her kicker is better as the player would, assuming he has a queen, raise pre-flop with Q-10 to AQ (so again assuming he has exactly one qeen it only beats 1 out of 3 hands he can have). The check post-flop from the A.I out of position is pretty standard, so is the c-bet from P6 and the call with top pair and possible straight draws. The A.I. checking that turn is again pretty standard, especially since the villain was the aggressor pre-flop and that you have top pair (so you want to induce a bet if you're opponent thinks he can bluff you of the hand. The check back from P6 makes sense since he has no more draws with the turn and the possibility that the A.I. is slow playing a better hand (AQ/QQ/KK/AA/99/77… although I assume the A.I would have 3-bet AQ/QQ/KK pre-flop), plus the fact that he wont get 3 streets of value from a worst hand. Now the check raise from the A.I. on the turn is really weird, especially considering the sizing of the raise, you would expect it to have either be a bluff or a monster, not a top pair 3rd kicker. It's very unlikely that the A.I.'s hand improved post-flop (the only hands that would be pocket deuces or threes. So if assuming the A.I. 3-bet AQ/AK/QQ/KK and maybe some smaller pairs pre-flop, the only hands in its very wide big-blind calling range that beat/tie you are AA/99/77/22/33/KQ/QJ/Q10/97(suited) and that's about it. As I said, I wouldn't expect such a raise from Q10 to KQ since you can easily be out-kickered by P6 so in my head the A.I has either one of the monsters I listed or one of the many more bluffs it could have check-called up to the river. For that reason (and unless we know the A.I. can slow playing hands like AQ/QQ/KK pre-flop out of position) the call makes some sense although you only beat bluffs.
    It's really that huge raise from the A.I. that's weird and probably a bad play, has it will have most worst hands (exept some Q10) folding but most better hands (exept some QJ/KQ) calling (and monsters raising all-in)

  17. I could finally win at Poker Night at the Inventory if I had this AI…
    I always wanted Strong Bad’s “Dangeresque Too” glasses…

  18. this hand is so common online , i do not see any special about it, also your explaination of "strengh" refers to ABC-basic poker . I play myself these traps , so does this make me an super human BIO-AI ? ;D

  19. Should we be worried that this thing is being developped by Facebook ?…
    Because like it's being said, poker is a fight where you don't know what your opponent has, that's pretty worrying to me that FB has a very efficient tool for that 😐

  20. River check-raise seems a little bit thin for value but I suppose the button's range is somewhat capped after the turn check back. I just don't see a lot of worse hands that will call the raise (other than the exact QT button had). Maybe it's a bit of a merge play as you would fold out some KQ and maybe AQ while still getting value from a couple of worse Qx hands? I'd love to see the analysis because one hand where you have your opponent pipped doesn't show much.

  21. Some subjects are not for everyone.I mean,even a FISH in poker would notice that you have no idea about what are you talking.I'm not trying to be offensive but,poker is not for everyone.

  22. I would assume that the A.I is always bluffing. I wouldn't bet a lot if I have a weak hand. Anything less than a pair is weak.

  23. How does the bot know that it's qj is strong enough to reraise the opponent?did Qt raise preflop?

    Hope this is not one of those holecard seeing bots.

  24. FYI in some of the examples shown both flop and river play by human were bets not raises. However I am sure the bot knows this after all only humans make mistakes

  25. The AI doesnt show 'weakness' on the Q flop when he checks the flop to the preflop raiser. Imo thats pretty standard lol

  26. I thought the computer power needed for nlhe was way too strong
    Limite hold’em has been mastered since way less variables are too count because bets are limited. If he is not lying on beating wsop winners and they played many hands that extremely impressive.

  27. so many people commenting on how the QJ hand was played poorly, when it was played perfectly by the bot.
    you have to understand that the bot his seen this exact spot essentially millions of times and gone through every possible outcome, so it knows.
    It's not using exploitative, assumptive logic that the commenters use to exploit fish in their local home game.

  28. Meh. I’d be curious to see what the sample size is. There have been tons of poker bots over the years. Some will inevitably run hot. Even a hundred thousand hands has a lot of variance in NLH. Especially with that aggro style. And the thousands of dollars is technically irrelevant. How many BB’s are they winning per hundred hands?

    Also, all play styles are exploitable when you know what the opponent is going to do. If you knew in advance that the Bot will play QJ like this, simply tightening up your range will beat hyper aggression. The more money your opponent is willing to put in the pot post-flop, the more you can afford to lose pre-flop because you’ll make up for it with overall higher showdown value.

  29. It's not superhuman. It's still a very impressive AI but the claims made by the researchers are completely disproportionate.

  30. Lol not impressed I've seen amateurs pull the same.moves the AI did. Poker is 70% luck and the AI has no chance winning continously especially if there are wild players and conservatives at the table the Ai wont be anything special …NEXT

  31. So there's a lot of misinformation floating around about this A.I.

    #1 the sample size was only 10,000 hands. Which to some of you may seem like a lot but to actual professional poker players they consider a sample size of at least 100,000 hands to see if you're a true winning player and to negate most variance.

    #2 the bot had an extreme difficulty making computations with varying stack sizes and its performance worsened the more the stack sizes varied.

    So, while impressive I would not consider it better than humans until I've seen more data

  32. Okay pluribus is pretty good although he played in a 6max format against players who weren't experts in 6max cash but in mtt's (tournament format) Which is pretty important.
    Also I heard that after the end of every hand the stack sizes were reset to 100 big blinds. That means the bot doesn't know how to play short stacked or deep stacked poker only the standard 100bb. Also some of the hands played by pluribus were analysed in a gto (game theory optimal) solver and it basically showed that even the AI wasn't playing close to a perfect strategy.
    I'd like to see a rematch but this time against the top 6 max players and without resetting the stacks. Only then you could say that AI can beat humans.

  33. Why would you team up with Chris Ferguson in a project about a poker AI? That dude is famous for stealing millions from players in an online poker scam. Only just this year did they finally permit him to play in the world series of poker because his temporary ban expired.

  34. 1#These top players played versus AI since they started playing versus AI, they also played differently that usually (my guess) I wonder what outcome would be if nobody told them that they playing vs AI.
    2#Play vs AI 100k hands and story would be different.
    3# if I play vs AI at full table or 6max the problem is that it’s hard to predict his range since he’s calling pretty wide regardless of position..
    4#AI looks only strong since he’s not getting tired or tilted in other words he’s not having emotions which is good thing in poker all you need to stick to your game and do your best. And that’s what AI is doing, but is he better than top players? I’m highly doubt..

  35. All poker and casino sites have something in their terms and service about using certain types of programs……..that being said if you get caught your not going to jail but you ip and account will be banned forever on the site. Also. They probably give this information out to other sites, so you could get blacklisted very quickly.

  36. If u see AI does a thin value raise on the river like that next time u are in that spot u can trap him with weak river cbet with a stronger hand

  37. It was a genius play if you think about it. The back-check on the turn of P6 was actually really bad because the AI had to know that KQ or AQ is going to bet 3 streets. So either the AI gets a massive amount of value for having the higher kicker or it wins the pot anyway, because no weaker hand than QT is going to call the reraise on the river. I think its quite profitable in the long run.

    If P6 would have c-betted the turn. The river would have been a check-check and P6 wouldnt have lost so much chips.

  38. Raise with Q J top pair? I want to play with this AI for real money I would crash its mechanical little head. So if the example on this video isn how it works then it’s really simple, bet huge it’s gonna think I have a big hand, if it calls then give up.

  39. Please tell us who are the two wsop main event players the AI played against. Surprising to see that none of those players admitted anywhere that they actually played this game against AI

  40. Being a main event champion doesn’t mean you’re a top tier player. I hate that they act like that’s a big deal

  41. So let me get this straight? A.I. have difficulty analyzing data and coming up with solutions when the information isn't presented whole(weakness),so you guys figured "Hey let's get rid of the only weakness A.I. so they can make Skynet a reality" People of the future, you have these guys to thank.

Leave a Reply

Your email address will not be published. Required fields are marked *

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top