Nice. Reminds me of an optimisation trick from a while ago: I remember being bot...

ThePadawan · on Aug 17, 2021

One of the things I recently learned that sounded the most "that can't possibly work well enough" is an optimization for sin(x):

If abs(x) < 0.1, "sin(x)" is approximated really well by "x".

That's it. For small x, just return x.

(Obviously, there is some error involved, but for the speedup gained, it's a very good compromise)

tzs · on Aug 17, 2021

To put some numbers on it, using N terms of the Taylor series for sin(x) [1] with |x| <= 0.1, the maximum error percentage cannot exceed [2]:

  N  Error Limit
  1  0.167% (1/6%)
  2  8.35 x 10^-5% (1/11980%)
  3  1.99 x 10^-8% (1/50316042%)
  4  2.76 x 10^-12% (1/362275502328%)

Even for |x| as large as 1 the sin(x) = x approximation is within 20%, which is not too bad when you are just doing a rough ballpark calculation, and a two term approximation gets you under 1%. Three terms is under 0.024% (1/43%).

For |x| up to Pi/2 (90°) the one term approximation falls apart. The guarantee from the Taylor series is within 65% (in reality it does better, but is still off by 36%). Two terms is guaranteed to be within 8%, three within 0.5%, and four within 0.02%.

Here's a quick and dirty little Python program to compute a table of error bounds for a given x [3].

[1] x - x^3/3! + x^5/5! - x^7/7! + ...

[2] In reality the error will usually be quite a bit below this upper limit. The Taylor series for a given x is a convergent alternating series. That is, it is of the form A0 - A1 + A2 - A3 + ... where all the A's have the same sign, |Ak| is a decreasing sequence past some particular k, and |Ak| has a limit of 0 as k goes to infinity. Such a series converges to some value, and if you approximate that by just taking the first N terms the error cannot exceed the first omitted term as long as N is large enough to take you to the point where the sequence from there on is decreasing. This is the upper bound I'm using above.

[3] https://pastebin.com/thN8B7Gf

MauranKilom · on Aug 17, 2021

Some trivia, partly stolen from Bruce Dawson[0]:

The sin(x) = x approximation is actually exact (in terms of doubles) for |x| < 2^-26 = 1.4e-8. See also [1]. This happens to cover 48.6% of all doubles.

Similarly, cos(x) = 1 for |x| < 2^-27 = 7.45e-9 (see [2]).

Finally, sin(double(pi)) tells you the error in double(pi) - that is, how far the double representation of pi is away from the "real", mathematical pi [3].

[0]: https://randomascii.wordpress.com/2014/10/09/intel-underesti...

[1]: https://github.com/lattera/glibc/blob/master/sysdeps/ieee754...

[2]: https://github.com/lattera/glibc/blob/master/sysdeps/ieee754...

[3]: https://randomascii.wordpress.com/2012/02/25/comparing-float...

quietbritishjim · on Aug 17, 2021

That is precisely the technique discussed in the article: it's the first term of the Taylor expansion. Except that the article used more terms of the expansion, and also used very slightly "wrong" coefficients to improve the overall accuracy within the small region.

RicoElectrico · on Aug 17, 2021

Come on, at this point I've seen this "engineering approximation" memed so many times, even on Instagram ;)

What is more interesting to me is that this can be one of rationales behind using radians.

And that tan(x)~x also holds for small angles, greatly easing estimations of distance to objects of known size (cf. mil-dot reticle).

lapetitejort · on Aug 17, 2021

tan(x)~x because cos(x)~1. It's approximations all the way down.

chriswarbo · on Aug 17, 2021

This is a very common assumption in Physics, e.g. https://en.wikipedia.org/wiki/Pendulum_(mathematics)#Small-a...

Whether it's appropriate in a numerical calculation obviously depends on the possible inputs and the acceptable error bars :)

sharikone · on Aug 17, 2021

I think that you will find that for subnormal numbers any math library will use the identity function for sin(x) and 1 for cos(x)

ThePadawan · on Aug 17, 2021

Right, but the largest subnormal number in single-precision floats is ~ 10^-38.

That the sin(x) approximation still works well for 10^-1 (with an error of ~0.01%) is the really cool thing!

jvz01 · on Aug 17, 2021

Another good hint is the classical half-angle formulas. You can often avoid calling sin() and cos() altogether!

Bostonian · on Aug 17, 2021

Why wouldn't you at least include the x^3 term in the Taylor series for abs(x) < 0.1?

ThePadawan · on Aug 17, 2021

That's what I assumed would have been a reasonable optimization!

What I really found amazing was that rather than reducing the number of multiplications to 2 (to calculate x^3), you can reduce it to 0, and it would still do surprisingly well.

mooman219 · on Aug 17, 2021

This would be a decent lookup for the atan2 function:

https://gist.github.com/mooman219/19b18ff07bb9d609a103ef0cd0...

tantalor · on Aug 17, 2021

https://en.wikipedia.org/wiki/Memoization

bluedino · on Aug 17, 2021

Technically it's a lookup table if you pre-compute them. Memoization would just be caching them as you do them.

tantalor · on Aug 17, 2021

Not necessarily, you could "cache" them in a compilation step and then use the table at runtime.

bruce343434 · on Aug 17, 2021

Tangential at best, but why was the 'r' dropped from that term? Or why not call it caching? Why the weird "memo-ization"? It makes me think of a mass extinction event where everything is turned into a memo.

franciscop · on Aug 17, 2021

It's explained right in the linked Wikipedia page:

> The term "memoization" was coined by Donald Michie in 1968[3] and is derived from the Latin word "memorandum" ("to be remembered"), usually truncated as "memo" in American English, and thus carries the meaning of "turning [the results of] a function into something to be remembered". While "memoization" might be confused with "memorization" (because they are etymological cognates), "memoization" has a specialized meaning in computing.

wongarsu · on Aug 17, 2021

The term memoization likely precedes the word caching (as related to computing, obviously weapon caches are far older). Memoization was coined in 1968. CPU caches only came about in the 80s as registers became significantly faster than main memory.

As wikipedia outlines, the r was dropped because of the memo. It's derived from the latin word memorandum that does contain the r, just like memory, but apparently it was more meant as an analogy to written memos.