Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a really interesting writeup, specially if you know a bit more about how Dota works.

That it managed to learn creep blocking from scratch was really surprising for me. To creep block you need to go out of your way to stand in front of the creeps and consciously keep doing so until they reach their destination. Creep blocking just a bit is almost imperceptible and you need to do it all the way to get a big reward out of it.

I also wonder if their reward function directly rewarded good lane equilibrium or if that came indirectly from the other reward functions



They link to a short description of the reward function in the blog: https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae939...


It's not really "from scratch". The bots are rewarded for the number of creeps they block, so it's not impossible that they would find some behavior to influence this score.


That was true for their original 1v1 bot, but in the latest blog post they mention bots can learn it on their own if left to train longer.


That's not rigorously supported. It's just an anecdote they mention off-hand. The final version of the bot does use the creep block reward.


To be clear:

- The 1v1 bot played at The International used a special creep block reward (and a big if statement separating that part of the agent from the self-play trained part). It trained for two weeks.

- A 2v2 bot discovered creep blocking on its own, no special reward. It trained for four weeks.

- OpenAI Five does not have a creep blocking reward, but neither (to our knowledge) does it creep block currently. Trained for 19 days!


I see. Thanks! So it manages to win lanes without even creep blocking? That's quite good. Any chance you could share the last hits @ 10 mins for the games it has played (for both bots and humans)? I think that's a crucial number to judge how OpenAI Five is winning its games.


I believe the article said that Blitz rated the bot last-hitting at about average for humans, although he might over-rate what an average human player last hits like.


Yeah, he might be overestimating 2.5k mmr players, and there's also something to be said about the consistency by which the bot last hits. A human player would have a high variance of last-hit performance, while the bot will probably guarantee a minimum amount, thus ensuring a minimum set of items needed for the mid-game transition.

But my larger point is, the early game doesn't have a lot of strategic elements in it. You have to last hit, not die, harass opponent, get items. You can play it by the book pretty much. The challenge in early game is to be able to handle 5 different things at the same time. So there's never really a question of what to do, but doing it does require mechanical prowess, which we know bots can easily be better at, than humans.

The team composition chosen is very early game snowball oriented. So is the bot winning simply due to mechanical superiority and early game advantage? Access to last hits @ 10 mins, gold and net worth graphs would allow us to answer that question.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: