More

Retr0id · 2026-02-24T22:55:43 1771973743

I'd love to see some graphs

hunterpayne · 2026-02-25T01:01:20 1771981280

Would it matter? Even before AI, most papers couldn't be replicated. Do we really think this is going to help the situation? Even if some of the AI papers are amazing, will anyone ever read them if most of the papers are useless? More research != more useful research. This is the logical outcome of publish or perish and Q-rankings being the main metric used.

"When a metric becomes a target, it ceases to be a good measure" - Goodhart's law

Retr0id · 2026-02-25T02:08:13 1771985293

Did you reply to the wrong thing?

Retr0id · 2026-02-24T19:05:37 1771959937

Because the company either has to address it, or stop pretending it's "listening to concerns" or whatever. Even if it doesn't change the outcome, it makes it clearer that the company is engaging in bad faith.

Retr0id · 2026-02-24T18:59:12 1771959552

We know how to do hardware-bound phishing-resistant credentials now, it is a solved problem.

Tharre · 2026-02-24T19:22:45 1771960965

I'm going to assume you're referring to auth codes, especially the ones sent via SMS? In which case yes, banks should definitely stop using those but that alone doesn't solve the overarching issue.

The next step is simply that the scammer modifies the official bank app, adds a backdoor to it, and convinces the victim to install that app and login with it. No hardware-bound credentials are going to help you with that, the only fix is attestation, which brings you back to the aformentioned issue of blessed apps.

Retr0id · 2026-02-24T19:25:35 1771961135

SMS 2FA is neither hardware-bound nor phishing resistant, I'm referring to hardware-bound phishing-resistant 2FA methods like passkeys.

Tharre · 2026-02-24T19:41:37 1771962097

Read my previous comment again. Passkeys are nice, but they don't solve the problem that's being discussed here.

Retr0id · 2026-02-24T19:48:09 1771962489

I'm not sure if you understand what makes passkeys phishing-resistant?

The backdoored version of the app would need to have a different app ID, since the attacker does not have the legitimate publisher's signing keys. So the OS shouldn't let it access the legitimate app's credentials.

Tharre · 2026-02-24T20:42:33 1771965753

I understand how passkeys work. You don't need the legitimate app's credentials, we're talking about phishing attacks, you're trying to bring the victim to giving you access/control to their account without them realizing that that's what is happening.

A simple scenario adapted from the one given in the android blog post: the attacker calls the victim and convinces them that their banking account is compromised, and they need to act now to secure it. The scammer tells the victim, that their account got compromised because they're using and outdated version of the banking app that's no longer suppported. He then walks them through "updating" their app, effectively going through the "new device" workflow - except the new device is the same as the old one, just with the backdoored app.

You can prevent this with attestation of course, essentially giving the bank's backend the ability to verify that the credentials are actually tied to their app, and not some backdoored version. But now you have a "blessed" key that's in the hands of Google or Apple or whomever, and everyone who wants to run other operating systems or even just patched versions of official apps is out of luck.

tadfisher · 2026-02-24T21:37:07 1771969027

> He then walks them through "updating" their app, effectively going through the "new device" workflow - except the new device is the same as the old one, just with the backdoored app.

This is where the scheme breaks down: the new passkey credential can never be associated with the legitimate RP. The attacker will not be able to use the credential to sign in to the legitimate app/site and steal money.

The attacker controls the fake/backdoored app, but they do not control the signing key which is ultimately used to associate app <-> domain <-> passkey, and they do not control the system credentials service which checks this association. You don't even need attestation to prevent this scenario.

Tharre · 2026-02-24T22:27:59 1771972079

> do not control the signing key which is ultimately used to associate app <-> domain <-> passkey, and they do not control the system credentials service which checks this association.

You're assuming the attacker must go through the credential manager and the backing hardware, but that is only the case with attestation. Without it, the attacker can simply generate their own passkey in software, because the backend on the banks side would have no way of telling where the passkey came from.

tadfisher · 2026-02-24T22:52:35 1771973555

How did the service authenticate the user in order to create the new credential within the attacker-controlled app?

Tharre · 2026-02-24T23:24:38 1771975478

With banks, typically a combination of your account number, pin and some confirmation code sent via email or SMS. And of course unregistering your previous device. Not sure where you're going with this though?

tadfisher · 2026-02-24T23:57:09 1771977429

I am just pointing out that you are essentially saying passkeys can be phished because banks can allow phishable credentials to bypass passkeys.

Tharre · 2026-02-25T01:15:02 1771982102

I never said that passkeys can be phished, I said they don't solve this problem, but yeah. Locking the front door while leaving the back door wide open, as they say. But unless you can convince people to go into the bank counter every time they change their phone, that's life.

microtonal · 2026-02-24T21:33:45 1771968825

I understand how passkeys work. You don't need the legitimate app's credentials, we're talking about phishing attacks, you're trying to bring the victim to giving you access/control to their account without them realizing that that's what is happening.

That doesn't work, because the scammer's app will be signed with a different key, so the relying party ID is different and the secure element (or whatever hardware backing you use), refuses to do the challenge-response.

tadfisher · 2026-02-24T20:25:53 1771964753

Correction: nothing prevents the attacker from using the app's legit package ID other than requiring the uninstall of the existing app.

The spoofed app can't request passkeys for the legit app because the legit app's domain is associated with the legit app's signing key fingerprint via .well-known/assetlinks.json, and the CredentialManager service checks that association.

mwwaters · 2026-02-24T21:26:48 1771968408

If the side loaded app does not have permission to use the passkeys and cannot somehow get the user to approve passkey access of the new app, that would be a good alternative to still allow custom apps.

tadfisher · 2026-02-24T21:46:32 1771969592

I don't think you understand. This exists _today_, regardless of how you install apps, because attackers can't spoof app signatures. If I don't have Bank of America's private signing key, I cannot make an app that requests passkeys for bankofamerica.com, because bankofamerica.com publishes a file [0] that says "only apps signed with this key fingerprint are allowed to request passkeys for bankofamerica.com" and Android's credential service checks that file.

No need for locking down the app ecosystem, no need to verify developers. Just don't use phishable credentials and you are not vulnerable to malware trying to phish credentials.

0: https://www.bankofamerica.com/.well-known/assetlinks.json

Retr0id · 2026-02-24T18:57:47 1771959467

> the malware captures their two-factor authentication codes

Aren't we supposed to have sandboxing to prevent this kind of thing? If the malware relies on exploiting n-days on unpatched OSes, they could bypass the sideloading restrictions too.

UncleMeat · 2026-02-24T20:19:42 1771964382

Codes arrive via SMS, which is available to all apps with the READ_SMS permission. This isn't an OS vuln. It is a property of the fact that SMS messages are delivered to a phone number and not an app.

On the Play store there is a bunch of annoying checking for apps that request READ_SMS to prevent this very thing. Off Play such defense is impossible.

Retr0id · 2026-02-24T20:40:20 1771965620

If they restricted sideloaded apps from sniffing SMS then I wouldn't mind all that much.

UncleMeat · 2026-02-25T00:15:54 1771978554

There are about a half dozen permissions that are regularly abused by malware. These permissions are also extremely useful for a ton of completely legitimate features.

I am pretty confident that if Google had enabled this policy only for apps which use these permissions that the community would still be upset.

warkdarrior · 2026-02-24T22:02:52 1771970572

So no access to SMS for apps distributed on F-Droid?

Retr0id · 2026-02-24T23:49:46 1771976986

Fine by me, what are people using SMS for in 2026 except for spam and sending 2FA codes insecurely?

(I'm being facetious here but this is massively preferable to disabling sideloading altogether)

deaux · 2026-02-25T07:15:47 1772003747

> sideloading

If you care about the topic, which you seemingly do, stop using this doubleplusgood term.

jhasse · 2026-02-24T23:00:54 1771974054

Only require Developer Registration for apps with READ_SMS then.

UncleMeat · 2026-02-25T00:16:28 1771978588

There are about a half dozen permissions that are regularly abused by malware. These permissions are also extremely useful for a ton of completely legitimate features.

I am pretty confident that if Google had enabled this policy only for apps which use these permissions that the community would still be upset.

Retr0id · 2026-02-24T17:45:32 1771955132

> you can already do sanitation by writing a function to check input before passing it to innerHTML

This is like saying C is memory safe as long as your code doesn't have any bugs.

More saliently, it does not consider parser differentials.

Retr0id · 2026-02-24T13:11:05 1771938665

I'm surprised by your benchmark results.

I've considered building this exact thing before (I think I've talked about it on HN even), but the reason I didn't build it was because I was sure (on an intuitive level) the actual overhead of the SQL layer was negligible for simple k/v queries.

Where does the +104% on random deletes (for example) actually come from?

swaminarayan · 2026-02-24T13:15:06 1771938906

Fair skepticism — I had the same intuition going in.

The SQL layer overhead alone is probably small, you're right. The bigger gain comes from a cached read cursor. SQLite opens and closes a cursor on every operation. SNKV keeps one persistent cursor per column family sitting open on the B-tree. On random deletes that means seek + delete on an already warm cursor vs. initialize cursor + seek + delete + close on every call.

For deletes there's also prepared statement overhead in SQLite — even with prepare/bind/step/reset, that's extra work SNKV just doesn't do.

I'd genuinely like someone else to run the numbers. Benchmark source is in the repo if you want to poke at it — tests/test_benchmark.c on the SNKV side and https://github.com/hash-anu/sqllite-benchmark-kv for SQLite. If your results differ I want to know.

Retr0id · 2026-02-24T13:22:25 1771939345

What does "column family" mean in this context?

swaminarayan · 2026-02-24T13:23:52 1771939432

A named key space within the same database file — keys in "users" don't collide with keys in "sessions" but both share the same WAL and transaction.

bflesch · 2026-02-24T13:27:12 1771939632

Did you measure the performance impact of having multiple trees in a single file vs. having one tree per file? I'd assume one per file is faster, is that correct?

swaminarayan · 2026-02-24T14:11:07 1771942267

no dont know about it. I will check it out.

d1l · 2026-02-24T13:23:09 1771939389

Are you using ai for the comment replies too?!

derwiki · 2026-02-24T13:32:37 1771939957

Everyone knows the emdash is a giveaway, and they are being left in

altmanaltman · 2026-02-24T13:24:26 1771939466

are you reading what you're writing?

d1l · 2026-02-24T13:26:39 1771939599

It's a nonstop slop funnel as far as I can tell. Only ashamed I've been here for more than 5 minutes.

swaminarayan · 2026-02-24T14:12:13 1771942333

nightfly · 2026-02-24T13:15:49 1771938949

It "only" doubles performance so the overheads aren't that heavy

2026-02-24T14:02:42 1771941762

[dead]

swaminarayan · 2026-02-24T14:35:40 1771943740

I have used this https://github.com/hash-anu/sqllite-benchmark-kv/blob/main/t..., in which I called sqlite3_exec for only setting PRAGMA settings, for main operations I never used it. I have used prepared/bound/step/reset functions only. so the gap isn't statement compilation overhead and the comparison is fair

Retr0id · 2026-02-24T13:05:12 1771938312

What are some examples of when buggy code can be tolerated?

nananana9 · 2026-02-25T09:51:42 1772013102

Leaf code. Anything that you won't have to build upon long-term and is not super mission critical. Data visualizers, dashboards, internal tools.

Pretty much everywhere where a 80% working tool is better than no tool, and without AI the opportunity cost to write the tool would be too high.

dktp · 2026-02-24T13:15:45 1771938945

From recent personal examples

We have a somewhat complicated OpenSearch reindexing logic and we had some issue where it happened more regularly than it should. I vibecoded a dashboard visualizing in a graph exactly which index gets reindexed when and into what. Code works, a little rough around the edges. But it serves the purpose and saved me a ton of time

Another example, in an internal project we made a recent change where we need to send specific headers depending on the environment. Mostly GET endpoint where my workflow is checking the API through browser. The list of headers is long, but predetermined. I vibecoded an extension that lets you pick the header and allows me to work with my regular workflow, rather than Postman or cURL or whatever. A little buggy UI, but good enough. The whole team uses it

I'm not a frontend developer and either of these would take me a lot of time to do by hand

jodrellblank · 2026-02-24T20:54:07 1771966447

You are setting up to say "I wouldn't tolerate that" for any example given, but if you look at the market and what makes people actually leave, instead of what makes people complain, then basically anything that isn't life-and-death, safety critical, big-money-losing, or data corrupting is tolerable. There's plenty of complaints about Microsoft, Apple, Gmail, Android, and all kinds of 3rd party niche business systems.

[Edit: DanLuu "one week of bugs": https://danluu.com/everything-is-broken/ ]

All the decades people tolerated blue-screens on Windows. All the software which regularly segfaulted years ago. The permeation of "have you tried turning it off and on again" into everyday life. The "ship sooner, patch later" culture. The refusal to use garbage collected or memory managed languages or formal verification over C/C++/etc because some bugs are more tolerable than the cost/effort/performance costs to change. Display and formatting bugs, e.g. glitches in video games. When error conditions aren't handled - code that crashes if you enter blank parameters. Bugs in utility code that doesn't run often like the installer.

One software I installed yesterday told me to disable some Windows services before the install, then the installer tried to start the services at the end of the install and couldn't, so it failed and finished without finishing installing everything. This reminded me that I knew about that, because that buggy behaviour has been there for years and I've tripped over it before; at least two major versions.

Another one I regularly update tells me to close its running processes before proceding with the install, but after it's got to that state, it won't let me procede and it has no way to refresh or rescan to detect the running process has finished. That's been there for years and several major versions as well.

One more famous example is """I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you’re leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.""" - Rasmus Lerdorf, creator of PHP. I've a feeling that was admitted about 37 Signals and Basecamp, it was common to restart Ruby-on-Rails code frequently, but I can't find a source to back that up.

simonw · 2026-02-24T13:07:05 1771938425

If the code is being used by a small group of people who are willing to figure out and share workarounds for those bugs - internal staff, for example.

skydhash · 2026-02-24T15:05:59 1771945559

Aren’t you also paying internal staff for their time. Waisting their time is waisting your money.

simonw · 2026-02-24T15:39:54 1771947594

I've been in these situations before. If there's a known bug in an internal tool that would take the development team a day to investigate and fix - aka $10,000s - it's often smarter to send around an email saying "don't click the Froople button more than once, and if you do tell Benjamin and he'll fix it in the database for you".

Of course LLMs change that equation now because the fix might take a few minutes instead.

zahlman · 2026-02-24T19:20:13 1771960813

> If there's a known bug in an internal tool that would take the development team a day to investigate and fix - aka $10,000s - it's often smarter to send around an email saying "don't click the Froople button more than once, and if you do tell Benjamin and he'll fix it in the database for you".

How much will Benjamin's time responding to those calls cost in the long run?

simonw · 2026-02-24T19:22:56 1771960976

Hopefully none, because your staff will read the email and not click the button more than once.

Or one of them will do it, Benjamin will glare at them and they'll learn not to do it again and warn their coworkers about it.

Or... Benjamin will spend a ton of time on this and use that to successfully argue for the bug to get fixed.

(Or your organization is dysfunctional and ends up wasting a ton of money on time that could have been saved if the development team had fixed the bug.)

dns_snek · 2026-02-24T17:15:41 1771953341

> development team a day to investigate and fix - aka $10,000s

What about the non-fictional 99.999999999% of the world that doesn't make $1000/hour?

bubblewand · 2026-02-24T18:45:11 1771958711

Large companies are often very bad at organizing work, to the tune of increasing the cost of everything by a large multiple over what you'd think it should be. Most of that cost wouldn't be productive developer time.

simonw · 2026-02-24T19:08:39 1771960119

It costs them single digit thousands instead.

jimbokun · 2026-02-24T21:08:18 1771967298

The alternative is the staff having no software at all to help with their task which wastes even more of their time.

Itoldmyselfso · 2026-02-24T13:34:02 1771940042

Points at the public sector

tonyedgecombe · 2026-02-25T07:33:57 1772004837

Any large enterprise. The software organisations write for themselves is pretty dire in most cases even without AI.

Retr0id · 2026-02-23T22:07:15 1771884435

Does it produce runnable binaries?

The repo does not make it clear, but Apple ships Linux builds of Rosetta 2 that can be used inside Linux VMs on apple silicon hardware [0]. With some patches (or so I'm told) it can be made to run on non-apple-silicon arm64 hardware.

Even if it's not fully decompiled yet, it should be possible to relink the decompiled subsections into an original binary.

[0]: https://developer.apple.com/documentation/virtualization/run...

duskwuff · 2026-02-23T22:24:09 1771885449

> Does it produce runnable binaries?

No. Even the decompiled version is incomplete - there's comments all over it which signal missing code like "could not recover jumptable ... too many branches". The "refactored" version is wildly speculative - it looks more like a very clumsy attempt to write a new translator than to reverse-engineer an existing one.

> With some patches (or so I'm told) it can be made to run on non-apple-silicon arm64 hardware.

With the huge caveat that the generated code will expect TSO to be enabled, and may malfunction on non-TSO ARM systems, particularly when running multithreaded code. (Most ARM systems are non-TSO; Apple Silicon has a MSR to enable TSO.)

inoki · 2026-02-23T22:10:24 1771884624

WIP ;) The final target might be to get Intel's Houdini-like binary (but for Intel instructions)

Retr0id · 2026-02-23T22:16:22 1771884982

re: patches, looks like they've reversed some of the relevant bits: https://github.com/Inokinoki/attesor/commit/233cb459b9db8345... (I was concerned this might be slop but that detail is promising!)

duskwuff · 2026-02-23T22:23:58 1771885438

That looks more like the AI inventing code to explain observed behavior (cf. "For Linux virtualization environments, we simulate this...").

inoki · 2026-02-23T22:28:11 1771885691

Yeah, I guess it's losing some contexts. Still need human work if want to make it really work on Linux...

Retr0id · 2026-02-23T22:33:51 1771886031

Looking closer it does look pretty nonsensical, ugh.

Retr0id · 2026-02-23T17:23:07 1771867387

While it is definitely slop, I think the numbers may be "real" but compiler-dependent. The 20000x "speedup" presumably only happens when the compiler detects that it can optimize the whole algorithm into a nop, because it has no observable side effects. (I have not tested this hypothesis)

chipbuster · 2026-02-24T07:58:18 1771919898

On my system, under gcc 15.2.1, the two necessary factors to see the claimed speedup are:

- An optimization level of -O1 or higher - The -ffinite-math-only flag, or any flag (e.g. -ffast-math) which implies this flag.

The benchmark uses a default value for weights which is #defin-ed as `__builtin_inf()`, and assigns this value in multiple places. This, of course, is concerning, since it gives a very obvious means by which the benchmark might be completely optimized out, though a more careful analysis would be needed to explain why the Dijkstra and (Res) functions don't also get optimized out.

For clang, the equivalent flags are

- An optimization level of Og or higher - The -fno-honor-infinities flag, or any flag (e.g. -ffast-math) which implies this flag.

Notably, while the author enables LTO, and others have compiled with -march=native, neither flag is necessary (for me) to see the huge speedup, which on my machine peaks at over 1.2 million times.

anematode · 2026-02-23T17:55:44 1771869344

Maybe, but I think the OP is submitting this in bad faith (or got utterly bamboozled by the AI). I tried with recent clang and the specified flags, and the behavior is the same.

(I think it's unlikely that it can be straight-up optimized out because it dirties the workspace thread local, and compilers generally don't optimize out writes to thread local variables.)

Retr0id · 2026-02-23T17:11:32 1771866692

A possible explanation for the difference is that the Dijkstra impl being tested is doing multiple heap allocations at every step, which doesn't seem like a fair comparison.

Secondarily I believe in the case of the "opt" version, the compiler might be optimising out the entire algorithm because it has no observable side-effects.