You seem to be conflating "numerical computing" with machine learning. However, numerical computing typically involves solving PDEs via e.g. finite elements or finite differences, or solving large systems of linear equations associated with such methods.
The difference between the two is that when solving PDEs, accuracy is paramount, so even using single precision is a bit of a compromise, whereas in machine learning, the trend seems to be to use half-precision or lower, sacrificing accuracy for speed.
For classical numerical computing, e.g. solving PDEs or linear equations, Java may not be the best choice, e.g. see the following paper:
How Java's Floating Point Hurts Everyone Everywhere:
FWIW, we use blas just like matlab,numpy,r, and julia.
The speed depends on the blas implementation.
Nd4j has a concept of "backends". Nd4j backends allow us to sub in different blas implementations as well as different ways of doing operations.
We have a data buffer type that allows people to specify floating point or double. Those data buffers then have an allocation type that can be javacpp pointers,nio byte buffers (direct/offheap) or normal arrays
We are also currently working on surprassing the jvm's memory limits ourselves via javacpp's pointers. That allows us to have 64 bit addressing which people normally have access to in c++.
Every current jvm matrix lib that uses net lib java or jblas is going to have problems with jvm communications as well. The reason for this is passing around java arrays and byte buffers is slower than how we handle it which is via passing longs (raw pointer addresses) around that are addressed via unsafe or allocated in jni where we retain the pointer address directly. We expose that to our native operators.
We are solving this by writing our own c++ backend called libnd4j that supports cuda as well as normal openmp optimized for loops for computation.
We also offer a unified interface to cublas and cblas (which is implemented by openblas as well as mkl)
FWIW, I more or less agree with you, but it doesn't mean it shouldn't exist.
JVM based environments can bypass the jvm just like python does now.
The fundamental problem with the jvm is no one just took what works on other platforms and mapped the concepts 1 to 1.
Our idea with nd4j is to not only allow people to write their own backends, but also provide a sane default platform for numerical computing on the jvm. Things like garbage collection shouldn't be a hindrance for what is otherwise a great platform for bigger workloads (hadoop,spark,kafka,..)
In summary, we know the jvm has been bad till now - it's our hope for fixing that.
Looking at it, I don't see anything dealing with sparse matrices or factorization (e.g., LU, QR, SVD). All the Java libraries for SVD are pretty bad. Plus, none of your examples mention double precision. Does the library support it?
I find it interesting in the Numpy comparison no mention of the BLAS Numpy is linked to is mentioned, but it is for Nd4j. Numpy is highly dependent on a good BLAS and the basic Netlib one isn't that great.
We actually appreciate eedback like this thank you.
For net lib java, it links against any blas implementation you give it. It has this idea of a JNILoader which can dynamically link against the fallback blas (which you mentioned)
Lapack doesn't implement any sparse linear algebra. If you think the landscape of "Java matrix libraries" is fragmented, when really they're all just different takes on wrapping Blas and Lapack or writing equivalent functionality in pure Java, wait until you look into sparse linear algebra libraries. There's no standard API, there are 3ish common and a dozen less common different storage formats, only one or two of these libraries have any public version control or issue tracker whatsoever, licenses are all over the map. The whole field is a software engineering disaster, and yet it's functionality you just can't get anywhere else.
> However there are quite a few sparse blas and lapack implementations now.
There's the NIST sparse blas, and MKL has a similar but not exactly compatible version. These never really took off in adoption (MKL's widely used of course, but I'd wager these particular functions are not). What sparse lapack are you talking about?
> If you want to help us fix it we are hiring ;).
We were at the same dinner a couple weeks ago actually. I'm enjoying where I am using Julia and LLVM, not sure if you could pay me enough to make me want to work on the JVM.
We have this concept of a backend where we had net lib java and jblas, but both are horribly slow/limited and need some updating which is why we added our approach to this.
If a new matrix framework comes out, I will just write a backend for it and allow people to keep the same dsl.
ND4J supports n-dimensional arrays, while Breeze does not. jblas is great but it's not fast enough. We've relied on netlibblas in the past, and we're moving to faster computation libs now. We'd love for Spark MLlib to plug into ND4J at some point.
I agree. A lot of that is because no one has taken the lessons hard learned from every other numerical language and just replicated it (hardware acceleration,BLAS,ndarrays,..)
The JVM has a lot of matrix library fragmentation.
Without easy access to SIMD or value types (and JNI does not count as easy), the JVM's performance for regular computations on large datasets is still pretty lacking. Quoting http://cr.openjdk.java.net/~jrose/values/values-0.html, "Numeric types like complex numbers, extended-precision or unsigned integers, and decimal types are widely useful but can only be approximated (to the detriment of type safety and/or performance) by primitives or object classes."
And the JVM's costs in terms of deployment and startup times aren't very attractive for scientific workloads. You wouldn't gain as much in a migration from Python or Matlab to Java (or any other JVM language) as you would to C++ (or increasingly Julia), since you'll likely still need access to the same native libraries and JNI is kind of a pain.
Having done both Java and scientific programming I'd mostly say that the verbosity of Java really distracts away from the task of writing scientific code. An academic really doesn't usually care to learn about singletons and classes to write really fast for-loops. Also while the speed of Java now is awesome, floating point performance is key for those scientists who do know more systems level programming (memory management is a big issues). Actually GC in Julia can be a pain also.
I have a passing knowledge of Julia (I've written a simple Lorentz Force[0] stepper in it) is there a way to avoid GC in the middle of the calculation, ie., a performance critical loop or other?
Which is why a c++ underbelly is so important. The computations shouldn't happen at the java level if a scientific lib is being done right. I'd say scala isn't a bad alternative to that (which we have nd4s which is a wrapper over nd4j)