For x87, you can use FLDZ to load `+0.0` onto the register stack. My guess is th...

titzer · on Nov 4, 2021

> For x87

Should read: for x87, unless you are doing this for fun, shoot yourself in the face and instead use SSE because no one in their right mind should be writing x87 code in 2021!

Please help this architectural misfeature die, which should have happened decades ago.

gpderetta · on Nov 4, 2021

For some operations 80 bit accumulators are still useful, so a mix of SSE and x87 might be optimal.

woodruffw · on Nov 4, 2021

> Should read: for x87, unless you are doing this for fun, shoot yourself in the face and instead use SSE because no one in their right mind should be writing x87 code in 2021!

My information is pretty old, but IIRC x87 still has specialized instructions for sin/cos/tan that are sometimes more performant than their equivalent implementations in SSE. x87 instructions are also very small, so trig-heavy workloads where I$ is a measurable performance component might unfortunately still be a good fit for x87.

adrian_b · on Nov 4, 2021

On recent Intel/AMD CPUs the x87 transcedental functions are usually much slower than their equivalents using SSE/AVX instructions.

For future CPUs, it is expected that the performance gap will increase.

The x87 trigonometric functions require typically between 100 and 200 clock cycles. During that time a recent CPU can execute 200 to 500 instructions, enough to compute many values of a trigonometric function (using a polynomial approximation). When SIMD instructions can be used, several tens of values of a function could be computed during a single x87 instruction.

lifthrasiir · on Nov 4, 2021

x87 trig instructions are pretty inaccurate even in the supported range mainly because of faulty range reduction [1] and can't be vectorized at all. If you have trig-heavy workloads nowadays you would want SIMD libm, not x87.

[1] http://notabs.org/fpuaccuracy/

cogman10 · on Nov 4, 2021

Why can't the trig instructions be vectorized? I've never quiet understood why SSE/AVX didn't add trig functions to finally kill off any argument for using x87.

adrian_b · on Nov 4, 2021

Unlike simple operations like a floating-point multiplication, trigonometric functions and the other transcedental functions are too complex, so they must be split into many steps.

If a trigonometric function is encoded as a single instruction, then it must launch a microprogram, to execute the many required steps.

A microprogram cannot execute faster than when the same execution steps would have been encoded as separate instructions, the only advantage of encoding a trigonometric function in a single instruction would be to reduce the program size. Most programs contain few trigonometric functions, so the reduction in program size is not worthwhile.

While a microprogrammed trigonometric function could be as fast as the equivalent sequence of instructions, in reality it is usually much slower.

The reason is that the modern CPUs are optimized for the most frequent instructions and they dedicate a minimum of resources for the seldom used microprogrammed instructions, so these hit various limitations that do not exist for the simple instructions. The microprogrammed instructions usually have some phases whose execution cannot be overlapped in time with other instructions, which leads to lower performance.

jabl · on Nov 4, 2021

There exist libraries that provide SIMD versions (on the fast path) of math functions, e.g. https://sourceware.org/glibc/wiki/libmvec

cogman10 · on Nov 4, 2021

Gotcha, hence the reason they've added instructions like FMA. Those don't require microprograms but do make things like calculating a taylor series faster. Right?

Tuna-Fish · on Nov 4, 2021

> My information is pretty old, but IIRC x87 still has specialized instructions for sin/cos/tan that are sometimes more performant than their equivalent implementations in SSE.

They absolutely are not. If you accept the same, crappy precision, you can do an estimate in just a few cycles instead of the 60+ that the instructions take.