More

SeanAppleby · 2026-01-07T01:56:27 1767750987

One thing I've been tossing around in my head is:

- How quickly is cost of refactor to a new pattern with functional parity going down?

- How does that change the calculus around tech debt?

If engineering uses 3 different abstractions in inconsistent ways that leak implementation details across components and duplicate functionality in ways that are very hard to reason about, that is, in conventional terms, an existential problem that might kill the entire business, as all dev time will end up consumed by bug fixes and dealing with pointless complexity, velocity will fall to nothing, and the company will stop being able to iterate.

But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed. And it has changed more if you are willing to bet that models within a year will be much better at such tasks.

And in my experience, claude is imperfect at refactors and still requires review and a lot of steering, but it's one of the things it's better at, because it has clear requirements and testing workflows already built to work with around the existing behavior. Refactoring is definitely a hell of a lot faster than it used to be, at least on the few I've dealt with recently.

In my mind it might be kind of like thinking about financial debt in a world with high inflation, in that the debt seems like it might get cheaper over time rather than more expensive.

ekidd · 2026-01-07T10:41:06 1767782466

> But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed.

Yup, I recently spent 4 days using Claude to clean up a tool that's been in production for over 7 years. (There's only about 3 months of engineering time spent on it in those years.)

We've known what the tool needed for many years, but ugh, the actual work was fairly messy and it was never a priority. I reviewed all of Opus's cleanup work carefully and I'm quite content with the result. Maybe even "enthusiastic" would be accurate.

So even if Claude can't clean up all the tech debt in a totally unsupervised fashion, it can still help address some kinds of tech debt extremely rapidly.

edg5000 · 2026-01-08T05:38:15 1767850695

Good point. Most of the cost in dealing with tech debt is reading the code and noting the issues. I found that Claude can produce much better code when it has a functionally correct reference implementation. Also it's not needed to very specifically point out issues. I once mentioned "I see duplicate keys in X and Y, rework it to reduce repetition and verbosity". It came up with a much more elegant way to implement it.

So maybe doing 2-3 stages makes sense. First stage needs to be functionallty correct, but you accept code smells such as leaky abstractions, verbosity and repetition. In stage 2 and 3 you eliminate all this. You could integrate this all into the initial specification; you won't even see the smelly intermediate code; it only exists as a stepping stone for the model to iteratively refine the code!

SeanAppleby · 2025-11-11T21:13:51 1762895631

Animal studies seem like the best tool for untangling this, and they indicate that high plastic doses cause a variety of health effects, some of which seem to align with broad health trends we see in our population over time, like in fertility.

It's not like there's zero data to inform the risk calculation.

SeanAppleby · on Feb 27, 2024

I am quite confident that at least some use cases for injecting context in at inference time are going to stay for at least the foreseeable future, regardless of model performance and scaling improvements, because IME those aren't the primary problems the pattern solves for me.

If you are dealing with highly cardinal permissioning models (even just a large number of users who own their own data, but the problem compounds if you have overlapping permissions), then tuning a separate set of layers for every permission set is always going to be wasteful. Trusting a model to have some kind of "understanding" of its permissioning seems plausible assuming some kind of omniscient and perfectly aligned machine, but unrealistic in the foreseeable future and definitely not going to cut it for data regs.

Also, in current status quo I don't believe there is a solution on the horizon for continuous, rapid incremental training in prod, so any data sources that change often are also going to be best addressed in this way. That will most likely be solved at some point, but it doesn't seem imminent, and regardless there will likely be some balancing of cost/performance where context from after the watermark being injected in at inference time might still make sense anyway to keep training costs managable rather than having to iterate training on literally every single interaction.

But yeah, if you're just using it because you have a single collection of context for many users which is too large to fit into the prompt, that seems like it will be subject to the problem you're describing. Although there might still be some benefit to cost/performance optimization both to keeping the prompt short (for cost) and focused (for performance).

SeanAppleby · on Feb 27, 2024

This is my problem with every end to end system I've seen around this. I find that, even building these systems from scratch, all of the hard parts are just normal data infrastructure problems. The "AI" part takes a small fraction of the effort to deliver even when just building the RAG part directly on top of huggingface/transformers.

I also have dealt with what you're describing, but then it goes much farther when going to prod IME. The ingestion part is even more messy in ways these kinds of platforms don't seem to help with. When managing multiple tools in prod with overlapping and non-constant data sources (say, you have two tools that need to both know the price of a product, which can change at any time), I need both of those to be built on the same source of truth and for that source of truth to be fed by our data infra in real time, where relevant documents need to be replaced in real time in more or less an atomic way.

Then, I have some tools that have varying levels of permissioning on those overlapping data sources, say, you have two tools that exist in a classroom, one that helps the student based on their work, and another that is used by the TA or teacher to help understand students' answers in a large course. They have overlapping data needs on otherwise private data, and this kind of permissioning layer which is pretty trivial in a normal webapp has, IME, had to have been implemented basically from scratch on top of the vector db and retrieval system.

Then experimentation, eval, testing, and releases are the hardest and most underserved. It was only relatively recently that it seemed like anyone even seemed to be talking about eval as a problem to aspire to solve. There's a pretty interesting and novel interplay of the problems of production ML eval, but with potentially sparse data, and conventional unit testing. This is the area we had to put the most of our own thought into for me to feel reasonably confident in putting anything into prod.

FWIW we just built our own internal platform on top of langchain a while back, seemed like a good balance of the right level of abstraction for our use cases, solid productivity gains from shared effort.

I think this is a really interesting problem space, but yeah, I'm skeptical of all of these platforms as they seem to always be promising a lot more than they're delivering. It looks superficially like there has been all of this progress on tooling, but I built a production service based on vector search in 2018 and it really isn't that much easier today. It works better because the models are so much better, but the tools and frameworks don't help that much with the hard parts, to my surprise honestly.

Perhaps I'm just not the user and am being excessively critical, but I keep having to deal with execs and product people throwing these frameworks at us internally without understanding the alignment between what is hard about building these kinds of services in prod and what these kinds of tools make easier vs harder.

ocolegro · on Feb 27, 2024

This is AMAZING feedback and it is on brand with what I've heard from a number of builders. Thanks for sharing your experiences here.

The infra challenges are real - it has been what I have been struggling the most with in providing high quality support for early users. Most want to be able to reliably firehose 10-100s of GBs of data through a brittle multistep pipeline. This was something I struggled with when building AgentSearch [https://huggingface.co/datasets/SciPhi/AgentSearch-V1] with LOCAL data - so introducing the networking component only makes things that much harder.

I think we have a lot of work to do to robustly solve this problem, but I'm confident that there is an opportunity to build a framework that results in net positives for the developer.

FWIW, Your feedback would be invaluable as the project continues to grow.

SeanAppleby · on May 14, 2022

I grew up working class and found that VCs were more accessible than, in my experience, any other upper class institution I have seen.

When I was younger I had an easier time getting meetings with partners at decent VC funds (including YC) than interviews with Google, for example. And accordingly an easier time getting seed funding than a prestigious internship.

VCs would generally look at prototypes and listen to the story, if you made the initial case concisely and it made sense, whereas other institutions would just throw my resume away with no calls because it didn't match whatever filters. I just wouldn't even get to talk to hiring managers at decent companies.

I didn't raise a really meaningful amount of money, but it seems implausible that VCs/angel investors who were willing to give me five-six figures for pre-seed wouldn't have given me six-seven in the next round if traction was there.

They said they would, and if they wouldn't, they knew the company had a runway such that it would need to raise again, so giving me anything would have been irrational. If I was going to be discriminated against for being from a working class background/not going to a good enough school/being a technical cofounder who didn't study CS, it would probably be at the very beginning.

projectazorian · on May 14, 2022

Willing to believe that this used to be the case, especially when YC was less famous and G was wildly elitist in its hiring practices (less so now). But I’ve seen very few venture funded startup founders, then or now, who lack either family wealth or an elite education.

In fact I’d say the diversity of founders seems worse than it was five years ago, back then it seemed like founders from underrepresented groups were becoming more common.

SeanAppleby · on Oct 6, 2021

The implicit assertion is that this behavior wouldn't happen if not for Zuckerberg personally causing the behavior.

As someone who has worked at large tech companies though, I find that to be an extremely questionable assertion.

FB has incentives. Zuckerberg didn't invent the dynamics surrounding their business, and I don't see how having a faceless bureaucracy in charge would lead to an organization that is more willing to reject its own incentives.

If anything it would seem like having more obscure and diffuse leadership would lead to less accountability, not more.

adolph · on Oct 6, 2021

This week’s EconTalk talks about that concept a bit. It gets a little tedious in spitballing solutions but I guess someone has to start somewhere.

https://www.econtalk.org/arnold-kling-on-reforming-governmen...

SeanAppleby · on Oct 6, 2021

Eichmann in Jerusalem is the book that coined the phrase for anyone passing through, and it's a pretty wild story.

It's essentially Arendt, a Jewish exile from Berlin who fled the holocaust, wrestling with her realization that Eichmann, who reported to Hitler and organized major portions of the holocaust, wasn't a psychopath, but a completely mundane and thoughtless career focused bureaucrat who was trying to rise in government and believed in doing what you are told, who then organized one of the most evil acts in human history without reflecting on what he was doing.

SeanAppleby · on Sept 14, 2021

I believe the point is that the answer begs another question of "why".

In their drinking example, alcoholism literally is overdrinking, but we accept that there is an underlying mechanism of physical and psychological dependency that exists outside of "they just consciously choose to drink all of the time and other people don't". At this point we generally recognize it as a disease with more nuance than "overdrinking". We recognize that there is a significant neurological component that is, at the point that they are an alcoholic, not under their total conscious control. Their subsconscious is pushing them to do things that the subconsciouses of people without the disease do not push for.

Similarly, you could ask the question, why do some people eat significantly more than they burn? And it then seems not implausible that the problem could be similar to that of alcoholism, that some neurological system is calling for the body to ingest more food, whereas other people's appetites are fundamentally more accurately calibrated at a subconscious level.

Much more speculative, but this would even seem to make sense from a wider lens. Almost our entire evolutionary lineage existed in a world of food scarcity, not overabundance. The selection pressure necessary to evolve a reliable safeguard against overeating would seem plausibly to not be old enough for that mechanism to have evolved to be as reliable and widespread as the one to prevent us from allowing ourselves to starve to death.

The health problems of the 20th and 21st century are still incredibly new from the perspective of the mechanisms that created our instinctual impulses.

zamadatix · on Sept 14, 2021

Agreed, I'm not sure how any of this disagrees with:

"It is an explanation. The article does not propose an alternative to the imbalance explanation rather alternative focus of action to achieve said intake balance."

I understand the article wishes to point out we should focus on a different action but it spends its time attacking the current definition not because it has a problem with the definition but because it has a problem with the currently popular action associated with the definition.

The popular definition is fine, it's the popular "soluttion" to the definition they attack. The paper is much more clear on this.

SeanAppleby · on Aug 14, 2021

I promise you that you have a vastly higher opinion of people who work at google than people who work at google do.

rejectedandsad · on Aug 14, 2021

They definitely think they're innately superior to people like me. If anything I'm underplaying it.

When you're given everything from peer bonuses to spot bonuses to first class flights to Hawaii to unlimited external validation from everyone else - it's not surprising.

SeanAppleby · on April 13, 2021

I think it depends purely on your definition of "tech".

Impossible engineering soybeans to produce more heme to make fake meat behave more like meat is a technology, in that it is a novel innovation applied to solve a real world problem.

But in modern common parlance "tech" tends to mean either that a company's offering is either entirely or heavily augmented by new software, or that the company has ties to a specific network of talent/investors/etc, or that the company has very low incremental costs per user. In those cases, they might not be a tech company.