This is super. Fwiw, Java and Scala aren't bad for numerical computing either: h...

ubasu · on March 12, 2016

You seem to be conflating "numerical computing" with machine learning. However, numerical computing typically involves solving PDEs via e.g. finite elements or finite differences, or solving large systems of linear equations associated with such methods.

The difference between the two is that when solving PDEs, accuracy is paramount, so even using single precision is a bit of a compromise, whereas in machine learning, the trend seems to be to use half-precision or lower, sacrificing accuracy for speed.

For classical numerical computing, e.g. solving PDEs or linear equations, Java may not be the best choice, e.g. see the following paper:

How Java's Floating Point Hurts Everyone Everywhere:

https://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

agibsonccc · on March 12, 2016

FWIW, we use blas just like matlab,numpy,r, and julia.

The speed depends on the blas implementation.

Nd4j has a concept of "backends". Nd4j backends allow us to sub in different blas implementations as well as different ways of doing operations.

We have a data buffer type that allows people to specify floating point or double. Those data buffers then have an allocation type that can be javacpp pointers,nio byte buffers (direct/offheap) or normal arrays

We are also currently working on surprassing the jvm's memory limits ourselves via javacpp's pointers. That allows us to have 64 bit addressing which people normally have access to in c++.

Every current jvm matrix lib that uses net lib java or jblas is going to have problems with jvm communications as well. The reason for this is passing around java arrays and byte buffers is slower than how we handle it which is via passing longs (raw pointer addresses) around that are addressed via unsafe or allocated in jni where we retain the pointer address directly. We expose that to our native operators.

We are solving this by writing our own c++ backend called libnd4j that supports cuda as well as normal openmp optimized for loops for computation.

We also offer a unified interface to cublas and cblas (which is implemented by openblas as well as mkl) FWIW, I more or less agree with you, but it doesn't mean it shouldn't exist.

JVM based environments can bypass the jvm just like python does now.

The fundamental problem with the jvm is no one just took what works on other platforms and mapped the concepts 1 to 1.

Our idea with nd4j is to not only allow people to write their own backends, but also provide a sane default platform for numerical computing on the jvm. Things like garbage collection shouldn't be a hindrance for what is otherwise a great platform for bigger workloads (hadoop,spark,kafka,..)

In summary, we know the jvm has been bad till now - it's our hope for fixing that.

tjl · on March 13, 2016

Looking at it, I don't see anything dealing with sparse matrices or factorization (e.g., LU, QR, SVD). All the Java libraries for SVD are pretty bad. Plus, none of your examples mention double precision. Does the library support it?

I find it interesting in the Numpy comparison no mention of the BLAS Numpy is linked to is mentioned, but it is for Nd4j. Numpy is highly dependent on a good BLAS and the basic Netlib one isn't that great.

agibsonccc · on March 13, 2016

Those are implemented by lapack as part of an nd4j backend.

Yes we have double precision - we have a default data type with the data buffer.

If you're curious how we do storage: https://github.com/deeplearning4j/nd4j/blob/master/nd4j-buff...

We have allocation types and data types.

Data types are double/float/int (int is mainly for storage)

Allocation types are the storage medium which can be arrays,byte buffers or what have you.

If you have a problem with the docs - I highly suggest filing an issue on our site: https://github.com/deeplearning4j/nd4j/issues

We actually appreciate eedback like this thank you.

For net lib java, it links against any blas implementation you give it. It has this idea of a JNILoader which can dynamically link against the fallback blas (which you mentioned)

or typically openblas or mkl. The problem there can actually be licensing though. The spark project runs in to this: https://issues.apache.org/jira/browse/SPARK-4816

If we don't mention on the site, it's probably because we haven't thought about it or haven't gotten enough feedback on something.

Unfortunately, we're still in heavy development mode.

FWIW, we have one of the most active gitter channels out there. You can come find me anytime if you're interested in getting involved.

tavert · on March 13, 2016

Lapack doesn't implement any sparse linear algebra. If you think the landscape of "Java matrix libraries" is fragmented, when really they're all just different takes on wrapping Blas and Lapack or writing equivalent functionality in pure Java, wait until you look into sparse linear algebra libraries. There's no standard API, there are 3ish common and a dozen less common different storage formats, only one or two of these libraries have any public version control or issue tracker whatsoever, licenses are all over the map. The whole field is a software engineering disaster, and yet it's functionality you just can't get anywhere else.

agibsonccc · on March 13, 2016

I'm aware of the different storage formats. However there are quite a few sparse blas and lapack implementations now.

I'm aware the software engineering logistics that go into doing sparse right which is why I held off

We are mainly targeting deep learning with this but sparse is becoming important enough for us to add it.

As for disparate standards I managed to work past that for cublas/blas.

I'm not going to let it stop me from doing it right. If you want to help us fix it we are hiring ;).

tavert · on March 13, 2016

> However there are quite a few sparse blas and lapack implementations now.

There's the NIST sparse blas, and MKL has a similar but not exactly compatible version. These never really took off in adoption (MKL's widely used of course, but I'd wager these particular functions are not). What sparse lapack are you talking about?

> If you want to help us fix it we are hiring ;).

We were at the same dinner a couple weeks ago actually. I'm enjoying where I am using Julia and LLVM, not sure if you could pay me enough to make me want to work on the JVM.

namelezz · on March 12, 2016

How is nd4j compared to Breeze and jblas, which are used in Spark MLlib[1]?

[1] - http://spark.apache.org/docs/latest/mllib-data-types.html

agibsonccc · on March 12, 2016

See my post here: https://news.ycombinator.com/edit?id=11275071

We have this concept of a backend where we had net lib java and jblas, but both are horribly slow/limited and need some updating which is why we added our approach to this.

If a new matrix framework comes out, I will just write a backend for it and allow people to keep the same dsl.

Breeze also isn't usable from java: http://stackoverflow.com/questions/27246348/using-breeze-fro...

FWIW, we also support both row and column major (you can specify the data as well as ordering) very similar to numpy.

If you come use nd4j, it will mainly be for a dsl that encourages vectorization just like any other numerical language.

vonnik · on March 12, 2016

ND4J supports n-dimensional arrays, while Breeze does not. jblas is great but it's not fast enough. We've relied on netlibblas in the past, and we're moving to faster computation libs now. We'd love for Spark MLlib to plug into ND4J at some point.

noobermin · on March 12, 2016

Java certainly has obtained a bad reputation in the sciences, however.

agibsonccc · on March 12, 2016

I agree. A lot of that is because no one has taken the lessons hard learned from every other numerical language and just replicated it (hardware acceleration,BLAS,ndarrays,..)

The JVM has a lot of matrix library fragmentation.

vonnik · on March 12, 2016

Just curious: Why is that?

tavert · on March 13, 2016

Without easy access to SIMD or value types (and JNI does not count as easy), the JVM's performance for regular computations on large datasets is still pretty lacking. Quoting http://cr.openjdk.java.net/~jrose/values/values-0.html, "Numeric types like complex numbers, extended-precision or unsigned integers, and decimal types are widely useful but can only be approximated (to the detriment of type safety and/or performance) by primitives or object classes."

And the JVM's costs in terms of deployment and startup times aren't very attractive for scientific workloads. You wouldn't gain as much in a migration from Python or Matlab to Java (or any other JVM language) as you would to C++ (or increasingly Julia), since you'll likely still need access to the same native libraries and JNI is kind of a pain.

elcritch · on March 12, 2016

Having done both Java and scientific programming I'd mostly say that the verbosity of Java really distracts away from the task of writing scientific code. An academic really doesn't usually care to learn about singletons and classes to write really fast for-loops. Also while the speed of Java now is awesome, floating point performance is key for those scientists who do know more systems level programming (memory management is a big issues). Actually GC in Julia can be a pain also.

noobermin · on March 13, 2016

I have a passing knowledge of Julia (I've written a simple Lorentz Force[0] stepper in it) is there a way to avoid GC in the middle of the calculation, ie., a performance critical loop or other?

[0] https://github.com/noobermin/jelo

ffriend · on March 13, 2016

You can turn garbage collection on and off using `gc_enable` function.

agibsonccc · on March 12, 2016

Which is why a c++ underbelly is so important. The computations shouldn't happen at the java level if a scientific lib is being done right. I'd say scala isn't a bad alternative to that (which we have nd4s which is a wrapper over nd4j)

tanlermin · on March 13, 2016

Ugh.

But in julia, its written in julia. There are even julia native tensor and matrix ops.