I have played with this but been underwhelmed. However I do think probably on the right track.
I know the ecosystem not-at-all (sum total knowledge of the CAD ecosystem is that my kids got a Bambu printer for Hanukkah) but it feels to me that current LLMs should be able to generate specs for something like https://partcad.readthedocs.io/en/latest/, which can then be sliced etc.
Curious to know what others think? I come at this from the position of zero interest in developing the fine design skills needed to master but wanting to be able to build and tweak basic functional designs.
> The unification and seamless workflow at that scale is painfully hard to achieve
It does make you wonder, why not just be a lot smaller? It's not like most of these teams actually generate any revenue. It seems like a weird structural decision which maybe made sense when hoovering up available talent was its own defensive moat but now that strategy is no longer plausible should be rethought?
Two reasons. 1 - they print cash through Ads which means there's opportunity or desire to do more things, or even a feeling like you should or can. So new products emerge but also to try diversify the revenue stream. 2 - the continuous hiring and scale means churn, people get bored, they leave teams, they want to do something new, it all bifurcates. It keeps fragmenting and fragmenting until you have this multilayered fractal. It's how systems in nature operate so we shouldn't think corporation's will be any different. The only way to mitigate things like this is putting in places limits, rules and boundaries, but that also limits the upside and if you're a public company you can't do that. You have to grow grow grow and then cut cut cut and continue in that cycle forever or until you die.
One thing I find interesting about discussions of typography in Cyrillic is how poor the overall readability of text is in most fonts compared to Latin because of the relative scarcity of risers and descenders (e.g. pqlt etc)
One of my tutors at university claimed that she was able to read 9th century manuscript Cyrillic faster than modern printed books because the orthography was more varied and easier to scan/speed-read.
I remember seeing some studies that experimentally show this to be true for Hebrew (another de/ascender-poor writing system), but can't find them at the moment.
Thanks for the factual explanation! I found the example cyrillic texts unreadable as a set of horizontal lines (serif) and vertical lines (characters themselves) giving the feeling of a grid, but I dimissed it as "I can't read cyrillic anyways".
Now that you wrote it down, it does actually makes sense.
It’s hard to bet against the foundation models winning consumer use cases where you can reimagine the whole product as a single tool or small number that can be dynamically plugged in to the underlying model and doesn’t require access to proprietary data/custom context.
I think this is a relative succinct summary of the downside case for LLM code generation. I hear a lot of this and as someone who enjoys a well-structured codebase, I have a lot of instinctive sympathy.
However I think we should be thinking harder about how coding will change as LLMs change the economics of writing code:
- If the cost of delivering a feature is ~0, what's the point in spending weeks prioritizing it? Maybe Product becomes more like an iterative QA function?
- What are the risks that we currently manage through good software engineering practices and what's the actual impact of those risks materializing? For instance, if we expose customer data that's probably pretty existential, but most companies can tolerate a little unplanned downtime (even if they don't enjoy it!). As the economics change, how sustainable is the current cost/benefit equilibrium of high-quality code?
We might not like it but my guess is that in ≤ 5 years actual code is more akin to assembler where sure we might jump in and optimize but we are really just monitoring the test suites and coverage and risks rather than tuning whether or not the same library function is being evolved in a way which gives leverage across the code base.
> As the economics change, how sustainable is the current cost/benefit equilibrium of high-quality code
"High quality code"? The standard today is "barely functional", if we lower the standards any further we will find ourselves debating how many crashes a day we're willing to live with, and whether we really care about weekly data loss caused by race conditions.
> However I think we should be thinking harder about how coding will change as LLMs change the economics of writing code: - If the cost of delivering a feature is ~0, what's the point in spending weeks prioritizing it?
Writing code and delivering a feature are not synonymous. The time spent writing code is often significantly less than the time spent clarifying requirements, designing the solution, adjusting the software architecture as necessary, testing, documenting, and releasing. That effort won't be driven to 0 even if an LLM could be trusted to write perfect code that didn't need human review.
I agree with your point on finding a new standard on what developers should do given LLM coding. Something that matters before may not be relevant in future.
My so far experiences boil down to: APIs, function descriptions, overall structures and testing. In other words, ask a dev to become an architect that defines the project and lay out the structure. As long as the first three points are well settled, code gen quality is pretty good. Many people believe the last point (testing) should be done automatically as well. While LLM may help with unit tests or tests on macro structures, I think people need to define high-levle, end-to-end testing goals from a new angel.
The question is whether treating code as a borderline black box balances out with the needed extra QA (including automated tests).
Just like strong typing reduces the amount of tests you need (because the scope of potential errors is reduced), there is a giant increase in error scope when you can’t assume the writer to be rational.
Interesting that the infographic (which I thought was exceptionally well-designed, well done USGS) found it necessary to call out that 0 billion gallons/day goes to Mexico. Was this done by previous or this administration I wonder? I do recall reading something about disputes between US and Mexico over abstraction. (Presumably from Rio Grande or similar).
It might be because there used to be a significant flow from the Colorado river into Mexico. But we extract so much that that's gone dry before it reaches the border.
Also, it might be just to show that the other landmass neighboring is not getting anything.
I once lived in Moscow on a compound with an adopted rescue dog. The compound had a shock collar and invisible fence setup.
Moscow’s street dogs are renowned for their intelligence. I have seen street dogs taking the escalators on the Metro. This dog worked out not just that the beeping + discomfort was worth the freedom, but also that he could wear out the battery faster by going up to the very edge of the fence - where the chirps became an uninterrupted beeeeep - and as soon as the beeping stopped, whoosh he was gone.
Fundamentally, we are at a point in time where models are already very capable, but not very reliable.
This is very interesting finding about how to improve capability.
I don't see reliability expressly addressed here, but my assumption is that these alloys will be less rather than more reliable - stronger, but more brittle, to extend the alloy metaphor.
Unfortunately for many if not most B2B use cases this reliability is the primary constraint! Would love to see similar ideas in the reliability space.
Great question. For me reliability is variance in performance and capability is average performance.
In practice high variance translates on the downside into failure to do basic things that a minimally competent human would basically never get wrong. In agents it's exacerbated by the compounding impact of repeated calls but even for basic workflows it can be annoying.
I don’t think variance is relevant to this application which is essentially a search function. As long as they find the answer 1/100, it doesn’t matter if it took them 100 tries - that’s just a cost optimization problem here.
That being said, I think variance implicitly improves in this context because this is the same as poll averaging that Nate Silver does - as long as the models are truly independent this averaging technique works as an improved result across the board (ie average and variance). However, if the models start converging with datasets and techniques this will degrade to become worse just as with polling with pollster herding and other problems the industry creates for themselves.
Re-read “The Art of Not Being Governed” by James C Scott which is really mind-expanding stuff.