Hacker Newsnew | past | comments | ask | show | jobs | submit | jachiam's commentslogin

Hello! Spinning Up author here. I would love to hear your thoughts on this! So far I have had a lot of success teaching people about DDPG using this code example, but I'm grateful for every fresh perspective. :)

Feel free to reach out by email, jachiam at openai.


There is no function in the world that should ever take 17 parameters. If the algorithm permits such configuration, as I am sure it does, then it should take a configuration object which has all these values. The object could then be constructed using special purpose factories that take fewer parameters, and then customized from there as needed.

It may be an indication that the whole thing needs refactoring though.


You refactor that way but then you make it unnecessarily more complicated.


Hello! Spinning Up author here.

Very reasonable point that it is not clearly explained why you need to store logp_pi in the buffer. But the reason is that it would require additional code complexity to calculate it on the fly later. The likelihood ratio requires the denominator to be on the _old_ policy, so if you wanted to compute it on the fly, you would need to have a second policy in the computation graph to preserve the old policy while you change the current policy. You could not simply do a stop_gradient on the current policy and get the same results.

My personal feeling is that tutorial-style explanations like this don't fit nicely into code comment flow. As a result, most tutorial-style descriptions went into material on the Spinning Up website rather than into the code. It isn't 100% comprehensive, certainly, but RL has an enormous surface area (there are tons and tons of little details that teaching material could dive into) and I feel pretty good about what we were able to cover. :)


Thank you for responding. Well, my point is that in particular the gradient on the likelihood ratio is what trips people up. They ask questions like 'why is this ratio not always 1' or similar. This is why I would say explaining what is going where here is critical, i.e. that we save the prior logp_pi (even though we could recompute it) to treat it as a constant value when computing the ratio/the gradient. That would be, from my perspective, the key pedagogical moment of a PPO tutorial. However his is purely subjective and I agree that one can feel differently about where to put explanations.


Hi! Keep an eye on our blog and twitter accounts. We announced this workshop back in November (in the original blog post for Spinning Up, located here: https://blog.openai.com/spinning-up-in-deep-rl/), and tweeted about a deadline extension for the workshop back in December.

We are still picking a date for the second workshop, and will announce it as soon as we can (and with a reasonable amount of lead time).

Cheers!


The guide doesn't assume access to specialized hardware. Experiments and iterations with all of those algorithms can be done on a normal CPU that anyone has. :)

Quote:

"Iterate fast in simple environments. To debug your implementations, try them with simple environments where learning should happen quickly, like CartPole-v0, InvertedPendulum-v0, FrozenLake-v0, and HalfCheetah-v2 (with a short time horizon—only 100 or 250 steps instead of the full 1000) from the OpenAI Gym. Don’t try to run an algorithm in Atari or a complex Humanoid environment if you haven’t first verified that it works on the simplest possible toy task. Your ideal experiment turnaround-time at the debug stage is <5 minutes (on your local machine) or slightly longer but not much. These small-scale experiments don’t require any special hardware, and can be run without too much trouble on CPUs."


Hi!

A few people had this question on Twitter also. Our response: "Several of us at OpenAI are thinking seriously about how to make something like this happen! I can't promise anything, but we definitely want to remove barriers to entry." (https://twitter.com/jachiam0/status/1060595172285632512)

In the meanwhile, you can still use Spinning Up with the Classic Control and Box2d envs in Gym (which don't require any licenses at all). And what's more: for most of these environments you don't need a GPU! CPU is fine.


Thank you for replying promptly. I am willing to help with such a change, if planned. Meanwhile I'll get started with running spinning-up on my laptop.


No, but please open up an issue on Github and we'll look into making one! :)

https://github.com/openai/spinningup/issues/new


Let us know by opening an issue on Github: https://github.com/openai/spinningup/issues/new


Perfect. Thanks


Hi! Primary developer for Spinning Up here. The code for this was developed mostly in June and July this year, and Eager still felt relatively new to me. I wanted to wait for Eager to stabilize and hit maturity before investing in it. I also wanted to see how TF would change on the road to TF 2.0, since that could change the picture even more.

At the six month review in 2019, we'll evaluate whether it makes sense to rewrite the implementation examples for TF 2.0. I'll speculate that the answer will be "yes, it does." Since Eager execution will be a central feature of TF 2.0, the (probable) revamp for Spinning Up will include it.

Good luck with your experiments! And please let us know about your experience with Spinning Up---we want to make sure it fulfills the mission of helping anyone learn about deep RL, and user feedback is vital for that.


Thank you for sharing your thought process!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: