Does this mean that ML on FPGA's will be more common? Can someone comment on viability of this? Would there be speedup and if so would it be large enough to warrant rewriting it all in VHDL/Verilog?
Yes, definitely to your first and last two questions!
It's not as viable as it resulting in a large scale FPGA movement anytime soon since the the industry and academia is heavy experienced with using GPUs. The software and libraries on GPUs, like CUDA, TensorFlow and other open source libraries are very mature and are optimized for GPUs. There will have to be libraries in Verilog (I for one I'm hoping to be a part of this movement for some time now, so I'd love it if anyone can guide me to anything going on)
There are some major to minor hurdles. Although some of them might not seem like much[0], here they are:
1. Till now deep learning/machine learning researchers have been okay with learning the software stack related to GPUs and there are widespread tutorials on how to get started, etc. Verilog/VHDL is a whole different ball game and a very different thought process. (I will address using OpenCL later)
2. The toolchain being used is not open source and it's not really hackable. Although that is not that important in this case, since you're starting off writing gates from scratch, there will be problems with licensing, bugs that will be fixed at snail's pace (if ever) till there will be a performant open source toolchain (if ever, but I have hope in the community). You'll have to learn to give up at a customer service rep if you try to get help, unlike open source libraries where to head to github's issue page and get help quickly with the main devs.
3. Although this move will make getting into the game a lot easier, it will still not change the fact that people want to have control over their devices and it will take time for people to realize they have to start buying FPGAs for their data centers and use them in production, which has to happen sometime soon. Using AWS's services won't be cost effective for long term usage, just like GPUs instances(I don't know how the spot instance siutation is going to look with the FPGA instances).
This comes with it's own slew of SW problems and good luck trying to understand what's breaking what with the much slower compilation times and terribly unhelpful debugging messages.
4. OpenCL to FPGA is a mess. Only a handful of FPGAs supported using OpenCL. So this has lead to there being little to no open source development surrounding OpenCL with FPGAs in mind. And no the OpenCL libraries for GPUs cannot be used for FPGAs. More likely as from scrach rewrite. There should be a LOT more tweaking done to get them to work. OpenCL to FPGA is not as seamless as one might think and is ridden with problems. This will again, take time and energy by people familiar with FPGAs who have been largely out of the OSS movement.
Although I might come of as pessimistic, I'm largely hopeful for the future in the FPGA space. This move isn't great news just because it lowers the barrier, but introduces a chip that will be much more popular and now we have a chip for which libraries can focus their support on, compared to before, when each dev had a different board. So you'll have to get familiar with this -- Virtex Ultrascale+ XCVU9P [1]
And also, what might be interesting to you is that, Microsoft is doing a LOT on research on this.
I think all of the articles on MS's use of FPGAs can explain better than I can in this comment.
I'd suggest started with the wired article or MS's blog post. Exciting stuff.
[0]: Remember that academia moves at a much slower pace in getting adjusted to the latest and greatest software than your average developer. The reason CUDA is still so popular although it is closed source and you can only use nvidia's GPUs is that it got in the game first and wooed them with performance. Although OpenCL is comparably performant(although there are some rare cases where this isn't true), I still see CUDA regarded as the defacto language to learn in the GPGPU space.