Hacker Newsnew | past | comments | ask | show | jobs | submit | herrington_d's commentslogin

There are researches backing some sort of "typed language is better for LLM". Like https://arxiv.org/abs/2504.09246, Type-Constrained Code Generation with Language Models, where LLM's output is constrainted by type checkers.

Also https://arxiv.org/abs/2406.03283, Enhancing Repository-Level Code Generation with Integrated Contextual Information, uses staic analyzers to produce prompts with more context info.

Yet, the argument does directly translate to the conclusion that typed language is rigorously better for LLM without external tools. However, typed language and its static analysis information do seem to help LLM.


Dynamically typed languages are far from "untyped". Though they may well require more effort to analyze from scratch without making assumptions, there is nothing inherently preventing type-constrained code generation of the kind the first paper proposes even without static typing.

A system doing type-constrained code-generation can certainly implement its own static type system by tracking a type for variables it uses and ensuring those constraints are maintained without actually emitting the type checks and annotations.

Similarly, static analyzers can be - and have been - applied to dynamically typed languages, though if these projects have been written using typical patterns of dynamic languages the types can get very complex, so this tends to work best with code-bases written for it.


The logic above can support exactly the opposite conclusion: LLM can do dynamic typed language better since it does not need to solve type errors and save several context tokens.

Practically, it was reported that LLM-backed coding agents just worked around type errors by using `any` in a gradually typed language like TypeScript. I also personally observed such usage multiple times.

I also tried using LLM agents with stronger languages like Rust. When complex type errors occured, the agents struggled to fix them and eventually just used `todo!()`

The experience above can be caused by insufficient training data. But it illustrates the importance of eval instead of ideological speculation.


In my experience you can get around it by having a linter rule disallowing it and using a local claude file instructing it to fix the linting issues every time it does something.


You can equally get around a significant portion of the purported issues with dynamically typed languages by having Claude run tests, and try to run the actual code.

I have no problem believing they will handle some languages better than others, but I don't think we'll know whether typing makes a significant difference vs. other factors without actual tests.


I always instructions to have the LLM run `task build` before claiming a task is done.

Build runs linters and tests and actually builds the project, kinda-sorta confirming that nothing major broke.


it does not always work in my experience due to complex type definitions. Also extra tool calls and time are needed to fix linting.


Or just bad training data. I've seen "any" casually used everywhere.


>The logic above can support exactly the opposite conclusion: LLM can do dynamic typed language better since it does not need to solve type errors and save several context tokens.

If the goal is just to output code that does not show any linter errors, then yes, choose a dynamically typed language.

But for code that works at runtime? Types are a huge helper for humans and LLMs alike.


Hi, ast-grep author here. thanks for sharing the video!

production quality is amazing! I also did some youtube video recently. It is not easy and end product is far from polished.

Really appreciate your help introducing the tool to more people, thanks!


Thank you! I found ast-grep to be really useful, I hope more people will discover it!


One thing AE provides but not coroutines is type safety. More concretely AE can specify what a function can do and cannot do lexically in code. A generator/coroutine cannot.

For example, a function annotated with say `query_db(): User can Datebase` means the function can call database and the caller must provide a `Database` handler to call the `query_db`.

The constraint of what can do and not is pretty popular in other programming fields, most notably, NextJS. A server component CANNOT use client feature and a client component CANNOt access server db.


Coroutines don't take away type safety any more than function calls do.

But this gets back to what I was saying about generalization - the way I would implement what you're talking about is with coroutines and dynamic scoping. I'm still missing how AE is more general and not something you implement on top of other building blocks.


I think the idea is that you can use it like async/await, except that a function must statically declare which interfaces it is allowed to await on, and the implementations are passed in through an implicit context. I'd be a bit worried that using it widely for capabilities, etc., would just multiply the number of function colors.


> would just multiply the number of function colors.

Would there really be colors?

I mean sure, the caller of an effectful function will either have to handle the effect or become effectful itself, so in this sense effectfulness is infectious.

However, while a function might use the `await` effect, when calling the function you could also just define the effect handler so as to block, instead of defering the task and jumping back to the event loop. In other words, wouldn't this solve the issue of colors? One would simply define all possibly blocking functions as await-effectful. Whether or not they actually await / run asynchronously would be up to the caller.


The problem is, if you're using them for capabilities, it wouldn't just be an 'Await' effect: it would be an 'AwaitDatabase' effect and an 'AwaitFilesystem' effect and an 'AwaitNetwork' effect and an 'AwaitSubprocess' effect and....

And everything working with generic function objects would have to lug around all these effects, unless the language has a very solid 'effect polymorphism' story.


> unless the language has a very solid 'effect polymorphism' story.

That seems to be the premise, yeah. (See also the comment by the Ante author on polymorphism somewhere here in the thread.)

> The problem is, if you're using them for capabilities, it wouldn't just be an 'Await' effect: it would be an 'AwaitDatabase' effect and an 'AwaitFilesystem' effect and an 'AwaitNetwork' effect and an 'AwaitSubprocess' effect and....

I have to admit I will have to think about this a bit. It's already late over here and my brain is no longer working. :)


the blog lacks the review of one critical player effect.ts https://effect.website/docs/error-management/two-error-types...


Calling other languages like "none of the tooling" in the "why" section sounds like a huge self-roasting since CCL does not have, say, highlighting/LSP/FFI for adoption.


> Better type information. Because the program is always in a valid state the type checker can always run and give meaningful feedback.

Note that programs can be syntactically well-formed but ill-typed. For example `let x = true + 1` has valid syntax but produces an "undefined" type for the variable `x`, if the type system does not support type error recovery.

A quote from the great paper from POPL 2024, https://hazel.org/papers/marking-popl24.pdf

> If a type error appears _anywhere_, the program is formally meaningless _everywhere_


The EYG type system does support recovery, so you will get multiple type errors if that's the case in the program.


My point is that type error recovery is the property of the type system. Not the property of structural code editor.


This is a pretty fair comment. I wonder how structural editing figures out things like formatting/searching/diffing and copy/paste across different editors.


Those are all orthogonal concerns to the way the text editor behaves. You could have a structural text editor operate on plain text files and a dumb text editor operate on AST, you can do plain text diffs of ASTs or diff AST parsed from plain text files, etc etc...

Some of these have existed historically (or even still exist).

Paredit-mode is a structural text editor that saves and loads plain text files, Smalltalk was typically implemented as a dumb text editor but then code was saved as compiled binaries (which meant you couldn't save your functions if they weren't syntactically correct but you could have unsaved syntactically broken functions), Mathematica represents its code in a weird format that might as well be binary but copy/paste converts to plain text, there's one git plugin (don't remember the name right now) that does syntax aware diffs even though git deals with plain text...


Since Skip is maintaining a reactive cache. How is the cached stored? I would speculate it would a distributed in-memory database so the computation service can scale. But I wonder how the system can recover and rebuild the cache if the cache is down.


The feature dates way back to perhaps 2006, per Jetbrains' PDF archive[1]. Jb also updated their structural search UI/UX in 2018[2]. It is possible that users do not buy in the idea of SSR. Alternatively, it may be caused by the fact that Jb SSR's mixed language support. [3][4][5]

[1] https://www.jetbrains.com/idea/docs/ssr.pdf [2] https://www.youtube.com/watch?v=YeGPO-UHTbs [3] https://www.jetbrains.com/help/idea/structural-search-and-re... [4] https://www.jetbrains.com/help/go/structural-search-and-repl... [5] https://www.jetbrains.com/help/pycharm/structural-search-and...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: