Nice. Reminds me of an optimisation trick from a while ago: I remember being bottlenecked by one of these trigonometric functions years ago when working with a probabilistic data structure... then I figured the input domain was pretty small (a couple dozen values), so I precomputed those and used an array lookup instead. A huge win in terms of perf, obviously only applicable in these extreme cases.
To put some numbers on it, using N terms of the Taylor series for sin(x) [1] with |x| <= 0.1, the maximum error percentage cannot exceed [2]:
N Error Limit
1 0.167% (1/6%)
2 8.35 x 10^-5% (1/11980%)
3 1.99 x 10^-8% (1/50316042%)
4 2.76 x 10^-12% (1/362275502328%)
Even for |x| as large as 1 the sin(x) = x approximation is within 20%, which is not too bad when you are just doing a rough ballpark calculation, and a two term approximation gets you under 1%. Three terms is under 0.024% (1/43%).
For |x| up to Pi/2 (90°) the one term approximation falls apart. The guarantee from the Taylor series is within 65% (in reality it does better, but is still off by 36%). Two terms is guaranteed to be within 8%, three within 0.5%, and four within 0.02%.
Here's a quick and dirty little Python program to compute a table of error bounds for a given x [3].
[1] x - x^3/3! + x^5/5! - x^7/7! + ...
[2] In reality the error will usually be quite a bit below this upper limit. The Taylor series for a given x is a convergent alternating series. That is, it is of the form A0 - A1 + A2 - A3 + ... where all the A's have the same sign, |Ak| is a decreasing sequence past some particular k, and |Ak| has a limit of 0 as k goes to infinity. Such a series converges to some value, and if you approximate that by just taking the first N terms the error cannot exceed the first omitted term as long as N is large enough to take you to the point where the sequence from there on is decreasing. This is the upper bound I'm using above.
The sin(x) = x approximation is actually exact (in terms of doubles) for |x| < 2^-26 = 1.4e-8. See also [1]. This happens to cover 48.6% of all doubles.
Similarly, cos(x) = 1 for |x| < 2^-27 = 7.45e-9 (see [2]).
Finally, sin(double(pi)) tells you the error in double(pi) - that is, how far the double representation of pi is away from the "real", mathematical pi [3].
That is precisely the technique discussed in the article: it's the first term of the Taylor expansion. Except that the article used more terms of the expansion, and also used very slightly "wrong" coefficients to improve the overall accuracy within the small region.
That's what I assumed would have been a reasonable optimization!
What I really found amazing was that rather than reducing the number of multiplications to 2 (to calculate x^3), you can reduce it to 0, and it would still do surprisingly well.
Tangential at best, but why was the 'r' dropped from that term? Or why not call it caching? Why the weird "memo-ization"? It makes me think of a mass extinction event where everything is turned into a memo.
It's explained right in the linked Wikipedia page:
> The term "memoization" was coined by Donald Michie in 1968[3] and is derived from the Latin word "memorandum" ("to be remembered"), usually truncated as "memo" in American English, and thus carries the meaning of "turning [the results of] a function into something to be remembered". While "memoization" might be confused with "memorization" (because they are etymological cognates), "memoization" has a specialized meaning in computing.
The term memoization likely precedes the word caching (as related to computing, obviously weapon caches are far older). Memoization was coined in 1968. CPU caches only came about in the 80s as registers became significantly faster than main memory.
As wikipedia outlines, the r was dropped because of the memo. It's derived from the latin word memorandum that does contain the r, just like memory, but apparently it was more meant as an analogy to written memos.