I respectfully claim that statement is wrong. Some of the most commonly used stream ciphers, A5/1, E0, Snow3G are explicitly designed to be efficiently implemented in HW.
Further, if you look at the eSTREAM you have the profile two algorithms that can be very efficiently be implemented in HW. And to be honest, the profile one algoritms can also be efficiently implemented in HW. I have implemented them all in HW and get good performance.
The stream cipher HC-128/256 for example is very fast in SW. But in HW I can parallelize the state read and updates in ways you can't do in SW due to lack of multiple read and write ports. Doing this you get multiple Gbps performance in HW even with low clock frequency.
If you look at the stream cipher RC4, it was not designed for HW implementation. But in HW I can implement RC to do three reads and two updates in parallel and reach 1 cycle/byte. In a low cost FPGA I reach 500 Mbps performance, which is pretty ok. Not that I'm promoting the use of RC4. My implementation was just an experiment to see if it was possible to do such a parallel implementation. Oh, and it is not debugged so don't use it anyway. ;-)