Hacker Newsnew | past | comments | ask | show | jobs | submit | bbayles's commentslogin

LLMs are very good at understanding decompiled code. I don't think people have updated on the fact that almost everything is effectively open source now!


Being able to read some iteration of potential source code doesn’t make it open source. Licensing, copyright, build chains, rights to modify and redistribute, etc are all factors.


I'm sympathetic to this view, but I also wonder if this is the same thing that assembly language programmers said about compilers. What do you mean that you never look at the machine code? What if the compiler does something inefficient?


Not even remotely close.

Compilers are deterministic. People who write them test that they will produce correct results. You can expect the same code to compile to the same assembly.

With LLMs two people giving the exact same prompts can get wildly different results. That is not a tool you can use to blindly ship production code. Imagine if your compiler randomly threw in a syscall to delete your hard drive, or decide to pass credentials in plain text. LLMs can and will do those things.


Even ignoring determinism, with traditional source code you have a durable, human-readable blueprint of what the software is meant to do that other humans can understand and tweak. There's no analogy in the case of "don't read the code" LLM usage. No artifacts exist that humans can read or verify to understand what the software is supposed to be doing.


yeah there is. it's called "documentation" and "requirements". And it's not like you can't go read the code if you want to understand how it works, it's just not necessary to do so while in the process of getting to working software. I truly do not understand why so many people are hung up on this "I need to understand every single line of code in my program" bs I keep reading here, do you also disassemble every library you use and understand it? no, you just use it because it's faster that way.


> do you also disassemble every library you use and understand it?

Sometimes.


> it's called "documentation" and "requirements"

What I mean is an artifact that is the starting point for generating the software. Compiled binaries can be completely thrown away whenever because you know you have a blueprint (the source code) that can reliably reproduce it.

Documentation & requirements _could_ work this way if they served as input to the LLMs that would then go and create the source code from scratch. I don't think many people are using LLMs this way, but I think this is an interesting idea. Maybe soon we'll have a new generation of "LLM-facing programming languages" that are even higher level software blueprints that will be fed to LLMs to generate code.

TDD is also a potential answer here? You can imagine a world where humans just write test suites and LLMs fill out the code to get it to pass. I'm curious if people are using LLMs this way, but from what I can tell a lot of people use them for writing their tests as well.

> And it's not like you can't go read the code if you want to understand how it works

In-theory sure, but this is true of assembly in-theory as well. But the assembly of most modern software is de-facto unreadable, and LLM-generated source code will start going that way too the more people become okay with not reading it. (But again, the difference is that we're not necessarily replacing it with some higher-level blueprint that humans manage, we're just relying on the LLMs to be able to manage it completely)

> I truly do not understand why so many people are hung up on this "I need to understand every single line of code in my program" bs I keep reading here, do you also disassemble every library you use and understand it? no, you just use it because it's faster that way.

I think at the end of the day this is just an empirical question: are LLMs good enough to manage complex software "on their own", without a human necessarily being able to inspect, validate, or help debug it? If the answer is yes, maybe this is fine, but based on my experiences with LLMs so far I am not convinced that this is going to be true any time soon.


Not only that but compiler optimizations are generally based on rigorous mathematical proofs, so that even without testing them you can be pretty sure it will generate equivalent assembly. From the little I know of LLM's, I'm pretty sure no one has figured out what mathematical principles LLM's are generating code from so you cant be sure its going to right aside from testing it.


I write JS, and I have never directly observed the IRs or assembly code that my code becomes. Yet I certainly assume that the compiler author has looked at the compiled output in the process of writing a compiler!

For me the difference is prognosis. Gas Town has no ratchet of quality: its fate was written on the wall since the day Steve decided he didn't want to know what the code says: it will grow to a moderate but unimpressive size before it collapses under its own weight. Even if someone tried to prop it up with stable infra, Steve would surely vibe the stable infra out of existence since he does not care about that


or he will find a way to get the AI to create harnesses so it becomes stable. The lack of imagination and willingness to experiment in the HN crowd is AMAZING me and worrying me at the same time. Never thought a group of engineers would be the most conservative and close minded people I could discuss with.


It's a paradox, huh. If the AI harness became so stable it wrote good code he wouldn't be afraid to look at the code he would be eager to look at it, right? But then if it mattered if AI wrote good code or not he couldn't defend his position that the way to create value with code is quantity over quality. He needs to sell the idea of something only AI can do, which means he needs the system to be made up of a lot of bad or low quality code which no person would ever want to be forced to look at.


There's a difference between "imagination and willingness to experiment" and "blind faith and gullibility".


Wait till you meet engineers other than sw engineers. Not even sure most sw people should be called engineers since there are no real accredited standards. I specifically trained as EE in physical electronics because other disciplines at the time seemed really rigid.

There's a saying that you don't want optimists building bridges.


The big difference is that compilation is deterministic: compile the same program twice and it'll generate the same output twice. It also doesn't involve any "creativity": a compiler is mostly translating a high-level concept into its predefined lower-level components. I don't know exactly what my code compiles to, but I can be pretty certain what the general idea of the assembly is going to be.

With LLMs all bets are off. Is your code going to import leftpad, call leftpad-as-a-service, write its own leftpad implementation, decide that padding isn't needed after all, use a close-enough rightpad instead? Who knows! It's just rolling dice, so have fun finding out!


> The big difference is that compilation is deterministic: compile the same program twice and it'll generate the same output twice.

That's barely true now. Nix comes close, but builds are only bit-for-bit identical if you set a bunch of extra flags that aren't set by default. The most obvious instability is CPU dispatch order (aka modern single computer systems are themselves distributed, racy systems) changes the generated code ever so slightly.

We don't actually care, because if one compiled version of the code uses r8 for a variable but a different compilation uses r9 for that variable, it doesn't matter because we just assume the resulting binary works the same either way. R8 vs r9 are implementation details that don't matter to humans. See where I'm going with this? If the LLM non-deterministically calls the variable fileName one day, and file_name the next time it's given the same prompt, yeah language syntax purists are going to suffer an aneurysm because one of those is clearly "wrong" for the language in use, but it's really more of an implementation detail at this point. Obviously you can't mix them, the generated code has to be consistent in which one it's using, but if compilers get to chose r8 one day and r9 the next, and we're fine with it, why is having the exact variable name that important, as long as it's being used correctly?


I’ve done builds for aerospace products where the only binary difference between two builds of the same source code is the embedded timestamp. And per FAA review guidelines, this deterministic attribute is required, or else something is wrong in the source code or build process.

I certainly don’t use all compilers everywhere, but I don’t think determinism in compilation is especially rare.


If your builds are not deterministic for the same set of inputs, you are doing something wrong - or victim of supply chain attack.

https://reproducible-builds.org/


No, some compilers aren't deterministic by design, e.g. because they compile stuff in parallel and don't take extra steps to enforce consistent ordering of things (because it doesn't matter).


The compiler is deterministic and the translation does not lose semantics. The meaning of your code is an exact reflection of what is produced.


We can tell you weren't around for the advent of compilers. To be fair, neither was I since the UNIX c compiler came out in '68 and was by far not the first compiler. Modern comilers you can make that claim about, but early compilers weren't.


I've been programming since 6502/6510 assembly language and all compilers I've used were deterministic (which isn't the same thing as being bug free or producing the correct output for a given input).


Bullshit.


All compilers have bugs. Any loss of semantics during compilation would be considered a bug. In order to do that, the source and target language need to be structured and specified. I wasn't around in the 60s either, but I think that hasn't changed.


Which early compilers were nondeterministic?


This analogy has always been bad any time someone has used it. Compilers directly transform via known algorithms.

Vibecoding is literally just random probabilistic mapping between unknown inputs and outputs on an unknown domain.

Feels like saying because I don't know how my engine works that my car could've just been vibe-engineered. People have put 1000s of hours into making certain tools work up to a give standard and spec reviewed by many many people.

"I don't know how something works" != "This wasn't thoughtfully designed"

Why do people compare these things.


No, it is not what assembly programmers said about compilers, because you can still look at the compiled assembly, and if the compiler makes a mistake, you can observe it and work around it with inline assembly or, if the source is available, improve the compiler. That is not the same as saying "never look at the code".


I feel like this argument would make a lot more sense if LLMs had anywhere near the same level of determinism as a compiler.


>but I also wonder if this is the same thing that assembly language programmers said about compilers

But as a programmer writing C code, you're still building out the software by hand. You're having to read and write a slightly higher level encoding of the software.

With vibe coding, you don't even deal with encodings. You just prompt and move on.


I've wondered if people who write detailed specs, are overly detailed, are in a regulated industry, or even work with offshore teams have success more quickly simply they start with that behavior. Maybe they have a tendency to dwell before moving on which may be slightly more iterative than someone who vibecodes straight through.


I wonder if assembly programmers felt this way about the reliability of the electical components which their code relies upon...


I wonder if electrical engineers felt this way about the reliability of the silicon crystal lattice their circuits rely upon…


I used to work with a QA person who really drove me nuts. They would misunderstand the point of a feature, and then write pages and pages of misguided commentary about what they saw when trying to test it. We'd repeat this a few times for every release.

This forced me to start making my feature proposals as small as possible. I would defensively document everything, and sprinkle in little summaries to make things as clear as possible. I started writing scripts to help isolate the new behavior during testing.

...eventually I realized that this person was somehow the best QA person I'd ever worked with.


how did misunderstanding a feature and writing pages on it help, not sure I follow the logic of why this made them a good QA person? Do you mean the features were not written well and so writing code for them was going to produce errors?


In order to avoid the endless cycle with the QA person, I started doing this:

> This forced me to start making my feature proposals as small as possible. I would defensively document everything, and sprinkle in little summaries to make things as clear as possible. I started writing scripts to help isolate the new behavior during testing.

Which is what I should have been doing in the first place!


If a QA person (presumably familiar with the product) misunderstands the point of a feature how do you suppose most users are going to fare with it?

It's a very clear signal that something is wrong with either how the feature was specified or how it was implemented. Maybe both.


I took GPs meaning that the QA person in question sucked, but them being the best meant the other QA folks they've worked with were even worse.


Let's call the person in question Alex. Having to make every new feature Alex-proof made all of the engineers better.


Did it? Sounds like making things "Alex proof" may have involved a large amount of over-engineering and over-documenting.


That's not at all what they meant. They meant they ended up raising their own quality bar tremendously because the QA person represented a ~P5 user, not a P50 or P95 user, and had to design around misuse & sad path instead of happy path, and doing so is actually a good quality in a QA.


It's possible but I'd guess they are probably not worse than the average user.


I worked with someone a little while ago that tended to do this; point out things that weren't really related to the ticket. And I was happy with their work. I think the main thing to remember is that the following are two different things

- Understanding what is important to / related to the functionality of a given ticket

- Thoroughly testing what is important to / related to the functionality of a given ticket

Sure, the first one can waste some time by causing discussion of things that don't matter. But being REALLY good at the second one can mean far less bugs slip through.


Most of the time QA should be talking about those things to the PM, and the PM should get the hint that the requirements needed to be more clear.

An under-specified ticket is something thrown over the fence to Dev/QA just like a lazy, bug-ridden feature is thrown over the fence to QA.

This does require everyone to be acting honestly to not have to belabor the obvious stuff for every ticket ('page should load', 'required message should show', etc.). Naturally, what is 'obvious' is also team/product specific.


I think noticing other bugs that aren't related to the ticket at hand is actually a good thing. That's how you notice them, by "being in the area" anyway.

What many QAs can't do / for me separates the good and the not so good ones, is that they actually understand when they're not related and just report them as separate bugs to be tackled independently instead of starting long discussions on the current ticket at hand.


so, QA should be noticing that the testers are raising tickets like this and step in and give the testers some guidance on what/how they are testing I've worked with a clients test team who were not given any training on the system so they were raising bugs like spam clicking button 100 times, quickly resizing window 30 times, pasting War and Peace.. gave them some training and direction and they started to find problems that actual users would be finding


I didn't mean reporting things that you wouldn't consider a bug and just close. FWIW tho, "Pasting War and Peace" is actually a good test case. While it is unlikely you need to support that size in your inputs, testing such extremes is still valuable security testing. Quite a few things are security issues, even though regular users would never find them. Like permissions being applied in the UI only. Actual users wouldn't find out that the BE doesn't bother to actually check the permissions. But I damn well expect a QA person to verify that!

Was I meant though were actual problems / bugs in the area of the product that your ticket is about. But that weren't caused by your ticket / have nothing to do with that ticket directly.

Like to make an example, say you're adding a new field to your user onboarding that asks them what their role is so that you can show a better tailored version of your onboarding flows, focusing on functionality that is likely to be useful for you in your role. While testing that, the QA person notices a bug in one of the actual pieces of functionality that's part of said onboarding flow.

A good QA understands and can distinguish what is a pre-existing bug and what isn't and report it separately, making the overall product better, while not wasting time on the ticket at hand.


Ha, that's certainly a way to build things fool-proof.


There's a typo in the URL here: > If you have a topic in mind but are not sure if it is suitable for Paged Out!, check out the Writing Articles page or contact us

It links to `?page=writing.pho` rather than `.php`


If I were them I would do that typo on purpose.


Whoops, thanks! We'll fix it in a moment.


(channeling Patrick McKenzie) If you have an S&P 500 index fund, you're a shareholder in Microsoft. Call their Investor Relations people, or send them a letter with this description. They will probably be of some help!


Agree on video games! I recently found a "developer photo insert" Easter egg in an old 3DO game: https://32bits.substack.com/p/under-the-microscope-total-ecl...


Fantastic game!

The same development studio, Sega AM2, recently had a developer reveal that he had put an Easter egg into Fighters Megamix for Saturn. However, he mistakenly introduced a crash bug in it.

This set me off looking for the Easter egg. After a couple days of reverse engineering, I finally found it [0]! I love looking for this stuff.

[0] https://32bits.substack.com/p/bonus-fighters-megamix


Really cool! I'll play with this to see if I can come up with some missing hashes for Tony Hawk 3.


In this case it's probably smarter to resort to brute force.

Here's a C program that will run a lot faster than the Python. On my M1 Max MacBook Pro, I can evaluate all 9-button combos in 5.2 seconds. Each extra button should increase the runtime by a factor of 8. Allowing up to n repetitions should multiply the runtime by n. So you should be able to evaluate virtually all combinations in like 20 minutes without further acceleration.

https://gist.github.com/rgov/f471423e13e955c074ba9bac36c961b...


If you feel like a challenge, Tony Hawk's Pro Skater 3 has some hashes with unknown inputs. The function is the same as the one in the article, and one target is 1eca8e89ad2dc1d6.


Try: TRIANGLE, UP, X, SQUARE, UP, X, CIRCLE, UP, X


Wow, nice work!

The other missing ones are: A75CA25CF4498F87 8FE0C6AA7CE60CEC B343B58CF0B72493 E0B20BEDFA0AC685 EFCC5A6FD62EC6D8


I tried up sequences of up to 10 buttons with up to 5 repetitions and then 11 buttons with no repetitions.

    1ECA8E89AD2DC1D6 = TUXSUXCUX
    A75CA25CF4498F87 = TDTRUTDTRU
    8FE0C6AA7CE60CEC = RLRLSLRRLRLSLR
    B343B58CF0B72493 = SSSCCSC
    E0B20BEDFA0AC685 = RCDXLSUTDX
    EFCC5A6FD62EC6D8 = no solution found
Curious to know what they do!


Excellent - these seem indeed to work. I'll try to figure out what they do.

I'll write up the results eventually. I sent you a note about crediting your contribution here. Many thanks!


Yeah, for sure. It's as expensive to generate the permutations as it is to do the hashing in this case!


And another thing I noticed: because the hash is built Button by button, you can reuse part of the state when checking sequences. So if you’re checking a 10 button sequence, you get all subsequences of that almost for free (just need a comparison after every step). Getting to 18 buttons of length is still a lot of calculation though.


Good point - the dictionary attack produces some permutations that are too long, but it doesn't matter because you get the effect as soon as the final character of the correct code is entered.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: