More

wickberg · 2025-11-05T03:02:09 1762311729

Newer airports usually try to have space, that's the only thing helping with the physics involved here.

Older airports might have EMAS [1] retrofitted at the ends to help stop planes, but that's designed more for a landing plane not stopping quickly enough (like [2]) - not a plane trying to get airborne as in this case.

[1] https://en.wikipedia.org/wiki/Engineered_materials_arrestor_... [2] https://en.wikipedia.org/wiki/Southwest_Airlines_Flight_1248

wickberg · on Nov 9, 2024

The US technician exam has almost zero relevance to operating a radio, outside of some legal reminders to broadcast your call sign every 10-minutes.

I'll second the recommendation for hamstudy.org. 90 minutes cramming that the night before the exam and I aced the test in 7 minutes.

wickberg · on Sept 9, 2024

That's 200Gbps from that card to any other point in the other 9,408 nodes in the system. Including file storage.

Within the node, bandwidth between the GPUs is considerably higher. There's an architecture diagram at <https://docs.olcf.ornl.gov/systems/frontier_user_guide.html> that helps show the topology.

p1esk · on Sept 9, 2024

I see, OK, I misinterpreted it as per node bandwidth. Yes, this makes more sense, and is probably fast enough for most workloads.

wickberg · on Sept 9, 2024

"Aurora" at Argonne National Labs is intended to be a bit bigger, but has suffered through a long series of delays. It's expected to surpass Frontier on the TOP500 list this fall once they some issues resolved. El Capitan at LLNL is also expected to be online soon, although I'm not sure if it'll be on the list this fall or next spring.

As others note, these systems are measured by running a specific benchmark - Linpack - and require the machine to be formally submitted. There are systems in China that are on a similar scale, but, for political reasons, have not formally submitted results. There are also always rumors around the scale of classified systems owned by various countries that are also not publicized.

Alongside that, the hyperscale cloud industry has added some wrinkles to how these are tracked and managed. Microsoft occupies the third position with "Eagle", which I believe is one of their newer datacenter deployments briefly repurposed to run Linpack. And they're rolling similar scale systems out on a frequent basis.

wickberg · on Sept 9, 2024

On a node-level, usually these are aiming for around 90-95% allocated. Note that, compared to most "cloud" applications, that usually involves a number of tricks at the system scheduling level to achieve.

At some point, in order to concurrently allocate a 1000-node job, all 1000 nodes will need to be briefly unoccupied ahead of that, and that can introduce some unavoidable gaps in system usage. Tuning in the "backfill" scheduling part of the workload manager can help reduce that, and a healthy mix of smaller single-node short-duration work alongside bigger multi-day multi-thousand-node jobs helps keep the machine busy.

wickberg · on Sept 9, 2024

Almost no modern systems are running Torus these days - at least not at the node level. The backbone links are still occasionally designed that way, although Dragonfly+ or similar is much more common and maps better onto modern switch silicon.

You're spot on that the bandwidth available in these machines hugely outstrips that in common cloud cluster rack-scale designs. Although full bisection bandwidth hasn't been a design goal for larger systems for a number of years.

p1esk · on Sept 9, 2024

LambdaLabs GPU cluster provides internode bandwidth of 3.2Tbps: I personally verified it in a cluster of 64 nodes (8xH100 servers) and they claim it holds for up to 5k GPU cluster. What is the internode bandwidth of Frontier? Someone claimed it's 200Gbps, which, if true, would be a huge bottleneck for some ML models.

wickberg · on Sept 9, 2024

Frontier is 4x 200Gbps links per node into the interconnect. The interconnect is designed for 540TB/s of bisection bandwidth. <https://icl.utk.edu/files/publications/2022/icl-utk-1570-202...>

Bisection bandwidth is the metric these systems will cite, and impacts how the largest simulations will behave. Inter-node bandwidth isn't a direct comparison, and can be higher at modest node counts as long as you're within a single switch. I haven't seen a network diagram for LambdaLabs, but it looks like they're building off 200Gbps Infiniband once you get outside of NVLink. So they'll have higher bandwidth within each NVLink island, but the performance will drop once you need to cross islands.

p1esk · on Sept 9, 2024

I thought NVLink is only for communication between GPUs within a single node, no? I don't know what the size of their switches are, but I verified that within a 64 node cluster I got the full advertised 3.2Tbps bandwidth. So that's 4x as fast as 4x200Gbps, but 800Gbps is probably not a bottleneck for any real world workload.

AlotOfReading · on Sept 9, 2024

It's 200 Gbps per port, per direction. That's the same as the Nvidia interconnect lambdalabs uses.

wickberg · on Sept 9, 2024

Frontier runs unclassified workloads. Other Department of Energy systems, such as the upcoming "El Capitan" at LLNL (a sibling to Frontier, procured under the same contract) are used for classified work.

wickberg · on March 29, 2024

I know Golang has their own implementation of sd_notify().

For Slurm, I looked at what a PITA pulling libsystemd into our autoconf tooling would be, stumbled on the Golang implementation, and realized it's trivial to implement directly.

tripflag · on March 29, 2024

indeed; it should be trivial in any language. Here's python: https://github.com/9001/copyparty/blob/a080759a03ef5c0a6b06c...

cesarb · on March 29, 2024

It should be trivial in any language which has AF_UNIX. Last time I looked, Java didn't have it, so the only way was to call into non-Java code.

fullstop · on March 29, 2024

At first I thought that this surely could not be true as of today, but it looks like it is.

There is AF_UNIX support, but only for streams and not datagrams: https://bugs.openjdk.org/browse/JDK-8297837

What an odd decision. I suppose that you could execute systemd-notify but that's a solution that I would not like.

ptx · on March 29, 2024

It looks like the FFI (Project Panama) finally landed in Java 22, released a few days ago: https://openjdk.org/jeps/454

Unless that feature also has some weird limitation, you could probably use that to call the socket API in libc.

cesarb · on March 29, 2024

> I suppose that you could execute systemd-notify but that's a solution that I would not like.

What I did was to use JNA to call sd_notify() in libsystemd.so.0 (when that library exists), which works but obviously does not avoid using libsystemd. I suppose I could have done all the socket calls into glibc by hand, but doing that single call into libsystemd directly was simpler (and it can be expected to exist whenever systemd is being used).

reftel · on March 29, 2024

Then I suggest you have another look =) https://inside.java/2021/02/03/jep380-unix-domain-sockets-ch...

fullstop · on March 29, 2024

Under Limitations: Datagram support

reftel · on March 29, 2024

It appears you are correct. What an odd limitation!

KerrAvon · on March 30, 2024

Caveat is that golang is not a good enough actor to be a reliable indicator of whether this interface is supported, though. They’ll go to the metal because they can, not because it’s stable.

pkaye · on March 29, 2024

Can me point me to the Golang implementation? Is it a standard package?

yencabulator · on March 29, 2024

Most likely https://github.com/coreos/go-systemd

wickberg · on Jan 2, 2024

Cluster authentication actually remains the sore point for all interactions - for most systems the least-common denominator remains whether one can SSH into a login node. At which point the main job commands - sbatch/squeue - are pretty stable.

There have been some attempts to standardize basic job management APIs in the past - DRMAA being one noteworthy example. Although DRMAA v2 was only ever implemented by Grid Engine, and is effectively an lightly-abstracted version of their internal APIs, that has never really seen first-class adoption by Slurm/PBS/LSF.

For Slurm, the REST API is meant to be the way forward. It punts the authentication problem to, potentially, anything the admins may care to wire up through an Apache / NGINX proxy. And the basic job submission and status APIs have stablizied to the point that a client application should be able to consume nearly any version going forward.

wickberg · on Oct 30, 2018

Slurm is endian and word-size agnostic; there's nothing about the POWER9 platform that would be insurmountable for it to run on. There are occasionally some teething issues with NUMA layout and other Linux kernel differences on newer platforms, but this tends to affect everyone, and get resolved quickly.

My understanding is that Spectrum (formerly Platform) LSF was included as part of their proposal.