> I don't have luajit to validate this, but changing the C test to dlsym() the symbol first and use that in the loop instead makes the test take 727ms instead of 896ms when going through the PLT on my machine.
Isn't this what every C or C++ library that wraps another with dlopen / dlsym does ?
Isn't this what every C or C++ library that wraps another with dlopen / dlsym does ?