I *think* the idea is that the *derivative* of x^2 is linear (namely 2x), which ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		a1369209993 on Aug 4, 2022 \| parent \| context \| favorite \| on: NAFNet: Nonlinear Activation Free Network for Imag... I think the idea is that the derivative of x^2 is linear (namely 2x), which makes the training/backpropagation very fast since it can evaluate 2x much faster than dsigma(x)/dx, dGELU(x)/dx, etc, and also speeds up evaluation since x*x is faster than sigma(x) or cetera. But if so they didn't explain it well or possibly at all. And "nonlinear activation" is still nonsense.

idontpost on Aug 5, 2022 | [–]

It's no faster than RELU in either direction, which is the most common choice.

eutectic on Aug 5, 2022 | [–]

Usually activation functions are not a bottleneck.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact