More

aschleck · 2025-10-13T02:08:49 1760321329

Atherton closed their station in 2020 and, through a great twist of irony, it will become a museum for the line that's still alive next to it: https://inmenlo.com/2025/06/16/community-interest-meeting-on...

aschleck · 2025-08-20T05:42:34 1755668554

It's been a while since I thought about this but isn't the reason providers advertise only 3.2tbps because that's the limit of a single node's connection to the IB network? DGX is spec'ed to pair each H100 with a Connect-X 7 NIC and those cap out at 400gbps. 8 gpus * 400gbps / gpu = 3.2tbps.

Quiz 2 is confusingly worded but is, iiuc, referring to intranode GPU connections rather than internode networking.

nickysielicki · 2025-08-29T00:06:26 1756425986

Try running a single node all to all with sharp disabled. I don’t believe you’ll see 450GB/s.

charleshn · 2025-08-20T14:31:17 1755700277

Yes, 450GB/s is the per GPU bandwidth in the nvlink domain. 3.2Tbps is the per-host bandwidth in the scale out IB/Ethernet domain.

jacobaustin123 · 2025-08-20T16:09:39 1755706179

I believe this is correct. For an H100, the 4 NVLink switches each have 64 ports supporting 25GB/s each, and each GPU uses a total of 18 ports. This gives us 450GB/s bandwidth within the node. But once you start trying to leave the node, you're limited by the per-node InfiniBand cabling, which only gives you 400GB/s out of the entire node (50GB / GPU).

xtacy · 2025-08-20T17:06:50 1755709610

Is it GBps (gigabytes per second) or Gbps (giga bits per second)? I see mixed usage in this comment thread so I’m left wondering what it actually is.

The article is consistent and uses Gigabytes.

shaklee3 · 2025-08-21T04:47:42 1755751662

aschleck · on Dec 10, 2024

Bazel makes easy things hard and impossible things possible

arunix · on Dec 10, 2024

Perl had a similar motto ("Making Easy Things Easy and Hard Things Possible")

https://books.google.com.sb/books?id=3Fc4DQAAQBAJ

aschleck · on April 16, 2024

I had this problem and complained to AT&T support. I know it doesn't make sense to replace the modem because this is a software, not hardware, issue but they very quickly offered to replace the modem and now my DNS isn't getting hijacked. Would recommend trying it!

drbawb · on April 16, 2024

Wonder if there's been a quite hardware-rev and they can't/don't want to update the firmware on the old units. I ran into that once on Spectrum - I bought the "same" modem to replace one, and suddenly my IPv6 config was borked.

Turns out Spectrum (in my region) actually pushes in their config file to disable IPv6, even though their dual-stack network works great, and has been working for at least 8-years now. Some modems apparently "override" that directive (e.g. ignore it and try to configure the IPv6 stack anyways) and you get fully functional IPv6 service. Other modems play goody-two-shoes and you're stuck with only IPv4. The new modem I bought was sold/marketed as the same model but was internally a totally different radio chipset. It was pulling a different firmware rev which had evidently been patched to actually obey the IP provisioning mode.

Spectrum support told me, basically, that if I have working IPv4 connectivity then my service is considered functional and there is nothing they can do. I gave up playing the support game and ended up exchanging until I landed on a Motorola modem that gleefully ignores that config parameter.

I wish I knew how to actually state my case to someone at Spectrum with the authority to actually fix their busted provisioning profiles, because it's kind of crazy to me that they've basically bifurcated IPv6 in this market based on whether or not your modem feels like reading the whole config file ;-P. (What's even funnier is the modems they install must be spec non-compliant, because IPv6 at the office works fine and that's their leased equipment.)

cosmie · on April 16, 2024

What modem did they replace it with?

aschleck · on Dec 14, 2023

I found that using "helm template" to convert every Helm chart into yaml, and then using Pulumi to track changes and update my clusters (with Python transformation functions to get per-cluster configuration) made my life so much better than using Helm. Watching Pulumi or Terraform watch Helm watch Kubernetes update a deployment felt pointlessly complicated.

jpgvm · on Dec 14, 2023

I do the same with Tanka + Jsonnet, definitely a million times better than dealing with Helm itself or god forbid, letting it apply manifests.

on Dec 15, 2023

[dead]

jpgvm · on Dec 15, 2023

The main thing is that it's still kicking, Ksonnet is sadly dead.

In addition to that it supports importing Helm charts, has a blessed convention for multiple environments and several native Jsonnet functions that make things a bit nicer.

personomas · on Dec 15, 2023

Interesting, thanks.

personomas · on Dec 15, 2023

What about using terraform instead of Pulumi? Why did you pick Pulumi for this?

aschleck · on Dec 15, 2023

This is both a good thing and a bad thing, but Pulumi is way more flexible than Terraform. I wanted to have a cloud-provider-specific submodule that created resources (like EKS and GKE) that then exported output values (think kubeconfig), and then I wanted the parent module to pass those in as inputs to a cloud-provider-independent submodule. Terraform couldn't do it without needing to duplicate a ton of code, or without something heinous like Terragrunt (not sure it even would have worked.) Pulumi makes it trivial and in a language I like writing.

Additionally, our applications consume our cloud configuration (eg something that launches pods on heterogenous GPUs needs to know which clusters support which GPUs, our colo cluster has H100s but our Google cluster has A100s etc.) Writing in the same language in the same monorepo makes it very easy to share that state.

totallywrong · on Dec 16, 2023

With Pulumi you really are programming infrastructure in a language of your choice. HCL is a bad joke in comparison.

notnmeyer · on Dec 14, 2023

i am hearing this more and more from folks.

aschleck · on Nov 10, 2023

1979 16 bit flops on an H100 is with sparsity. See footnote 2 on https://www.nvidia.com/en-us/data-center/h100/. You should be halving it for non-sparse flops.

YetAnotherNick · on Nov 11, 2023

GP is correct. With sparsity it is 3958. 1979 Tflop/s is without sparsity.

emu · on Nov 11, 2023

No, it is not. That's the sparse fp8 flop number, but you need to ignore sparsity and compare bf16 flops not fp8 flops for the comparison the ancestor post is making.

aschleck · on Nov 10, 2023

It's worth noting that just because an H100 has a higher flops number doesn't mean your program is actually hitting that number of flops. Modern TPUs are surprisingly competitive with Nvidia on a perf/$ metric, if you're doing cloud ML they are absolutely worth a look. We have been keeping costs down by racking our own GPUs but TPUs are so cost effective that we need to do some thinking about changing our approach.

I'm not certain but I think part of this is that XLA (for example) is a mountain of chip-specific optimizations between your code and the actual operations. So comparing your throughput between GPU and TPU is not just flops-to-flops.

aschleck · on Oct 3, 2023

Cool stuff! Do you have any thoughts about how this compares to https://github.com/fabianlindfors/reshape?

tudorg · on Oct 3, 2023

Great question! Reshape was definitely a source of inspiration, and in fact, our first PoC version was based on it.

We decided to start a new project for a couple of reasons. First, we preferred it to have it in Go, so we can integrated it easier in Xata. And we wanted to push it further, based on our experience, to also deal with constraints (with Reshape constraints are shared between versions).

aschleck · on Sept 22, 2023

Luiz called large meetings "all-heads" instead of "all-hands" meetings. "Why? Because we're not sailors!"

I never got to interact with him directly but he was a leader worth following. This is a sad day.

aschleck · on Aug 17, 2023

A website making trail data from OpenStreetMap easily browsable: https://trailcatalog.org/

Myself and a friend have been working on this for quite a while. It's frustrating because it has come so far (showing trail data worldwide interactively on a tiny budget is challenging), and yet it is so far from being something super useful like AllTrails due to low data quality and a lack of relevant features (photos, reviews, etc etc.)

tuukkah · on Aug 17, 2023

Have you looked into following the links from OSM elements to Wikidata items and their associated Wikipedia/WikiVoyage articles, Commons images etc.?

aschleck · on Aug 17, 2023

Ahh this is a good idea, my friend in fact has been asking me to add it forever. I've been hesitating since the data seems so sparse but I think you're both right. Thank you for the feedback!