More

k2xl · 2025-07-05T03:44:39 1751687079

Baba is You is a great game part of a collection of 2D grid puzzle games.

(Shameless plug: I am one of the developers of Thinky.gg (https://thinky.gg), which is a thinky puzzle game site for a 'shortest path style' [Pathology] and a Sokoban variant [Sokoath] )

These games are typically NP Hard so the typical techniques that solvers have employed for Sokoban (or Pathology) have been brute forced with varying heuristics (like BFS, dead-lock detection, and Zobrist hashing). However, once levels get beyond a certain size with enough movable blocks you end up exhausting memory pretty quickly.

These types of games are still "AI Proof" so far in that LLMs are absolutely awful at solving these while humans are very good (so seems reasonable to consider for for ARC-AGI benchmarks). Whenever a new reasoning model gets released I typically try it on some basic Pathology levels (like 'One at a Time' https://pathology.thinky.gg/level/ybbun/one-at-a-time) and they fail miserably.

Simple level code for the above level (1 is a wall, 2 is a movable block, 4 is starting block, 3 is the exit):

000

020

023

041

Similar to OP, I've found Claude couldn’t manage rule dynamics, blocked paths, or game objectives well and spits out random results.

eru · 2025-07-05T09:18:57 1751707137

NP hard isn't much of a problem, because the levels are fairly small, and instances are not chosen to be worst case hard but to be entertaining for humans to solve.

SMT/SAT solvers or integer linear programming can get you pretty far. Many classic puzzle games like Minesweeper are NP hard, and you can solve any instance that a human would be able to solve in their lifetime fairly quickly on a computer.

kinduff · 2025-07-05T03:47:57 1751687277

In Factorio's paper [1] page 3, the agent receives a semantic representation with coordinates. Have you tried this data format?

[1]: https://arxiv.org/pdf/2503.09617

k2xl · 2025-07-01T10:39:57 1751366397

A puzzle game site https://thinky.gg. The game has been around since 2007!

k2xl · 2025-06-28T12:28:24 1751113704

Wasnt there some youtuber that built a jigsaw puzzle solver machine? Wonder if that could be used here

jcheng · 2025-06-28T16:36:31 1751128591

Stuff Made Here https://www.youtube.com/watch?v=WsPHBD5NsS0

Mark Rober https://www.youtube.com/watch?v=Sqr-PdVYhY4

k2xl · 2025-06-18T14:07:48 1750255668

As a game developer for a grid based puzzle game (https://thinky.gg - one of the games Pathology is a game where you have to go from Point A to Point B in shortest amount of steps).

I have found A* fascinating not because of the optimization but also from the various heuristics that can be built on top of it make it more generalized for other types of pathfinding.

Some devs have built solvers that use techniques like bidirectional search, precomputed pattern databases, and dead locking detection.

ncr100 · 2025-06-18T17:54:05 1750269245

Just a note about bidirectional, or "double-ended" as I learned it - this can be very useful (i read 30% speed-up) for City / National Road searches.

One also has multiple layers of roadways of varying arterial significance, allowing higher speed (lower weight) travel, with real-world roads.

It was used at a mapping job to great boon by our backend.

k2xl · 2025-06-10T21:28:55 1749590935

> The least-likely part of the work is behind us; the scientific insights that got us to systems like GPT-4 and o3 were hard-won, but will take us very far.

> 2026 will likely see the arrival of systems that can figure out novel insights

Interesting the level of confidence compared to recent comments by Sundar [1]. Satya [2] also is a bit more reserved in his optimism.

[1] https://www.windowscentral.com/software-apps/google-ceo-agi-...

[2] https://www.tomshardware.com/tech-industry/artificial-intell...

blibble · 2025-06-10T21:43:57 1749591837

> Interesting the level of confidence compared to recent comments by Sundar [1]. Satya [2] also is a bit more reserved in his optimism.

well yeah, you can't continue the ponzi scheme if you say "aw shit, it's gonna take another 10 years"

Microsoft/Google will continue to exist without, OpenAI won't

thcipriani · 2025-06-10T22:45:15 1749595515

Gotta put the optimism in context vs. previous Sam Altman writing.

Here he says:

> Intelligence too cheap to meter is well within grasp.

Six months ago[0] he said:

> We are now confident we know how to build AGI as we have traditionally understood it.

This time:

> we have recently built systems that are smarter than people in many ways

My summary: ChatGPT is already pretty great and we can make it cheaper and that will help humanity because...etc

Which moves the goal posts quite a bit vs: we'll have AGI pretty soon.

Could be he didn't reiterate we'd have AGI soon because he thought that was obvious/off-topic. Or it could be that he's feeling less bullish, too.

[0]: <https://blog.samaltman.com/reflections>

lossolo · 2025-06-11T01:15:36 1749604536

In a recent interview with The Verge (available on YouTube), the DeepMind CEO said that LLMs based on transformers can't create anything truly novel.

k2xl · 2025-06-10T21:15:46 1749590146

Not distilled, same model. https://x.com/therealadamg/status/1932534244774957121?s=46&t...

k2xl · 2025-04-29T18:37:52 1745951872

I've done something similar for learning about a controversial topic. I ask it to act as if it is called Bob is a well informed supporter of one side (like Ukraine) and then act as if it is something named Alice who is a well informed supporter of another side (Russia) and they have to debate each other over a few prompts with a moderator named 'Sue'

Then after a few rounds of the debate where Sue asks a bunch of questions, I ask it to go to the judges - Mark, Phil, Sarah (and I add a few personalities to each of them... Sometimes I pretend they are famous moral philosophers) and then I have them each come up with a rubric and decide who is the winner.

Really fun, and helps me understand different sides of issues.

rat87 · 2025-04-29T21:15:58 1745961358

That seems like a terrible idea. At best it seems likely to help you make a false but convincing sounding case. I really hope no one is using that to help them understand controversial topics much less using that to determine their stances.

Id recommend looking into actual human experts who are trustworthy and reading them. Trying to get LLM to argue the case will just get you a lot of false information presented in a more convincing fashion

k2xl · 2025-05-01T17:33:16 1746120796

I recommend you try it before judging. I will be honest it has been actually very useful.

You are also assuming that the LLM is providing false information (or will have a higher chance of providing false information than a human)

k2xl · 2025-04-22T15:41:43 1745336503

I've been a lukewarm user of Supabase for my side projects. Unfortunately the amount of work to get off of it has been too high for me to leave.

The major issue is - cost. It is way more expensive than I realized as they have so many little ways they charge you. It's almost like death by thousands of paper cuts. My bill for my app with just a few thousand users was $70 last month.

I do like the tooling and all, but the pricing has been very confusing.

jamil7 · 2025-04-22T15:47:41 1745336861

Kind of the same feeling, I don't use all of services they offer either and when I looked at self-hosting, it all seemed kind of heavy and fragile to self-host. I ended up replicating the parts I used with a small API layer connected to a managed postgres db for a tenth of the cost or something. I'd say it's pretty handy for prototyping but not sure I'd want to build a business on the back of it.

cpursley · 2025-04-22T22:07:09 1745359629

> just a few thousand users was $70 last month.

Few Thousand!?! Sound very reasonable to me. Monetize just two of those users at $35 per month and your server costs are covered. Or run it yourself, there's a lot of moving parts but it's all open source.

Capricorn2481 · 2025-04-22T23:47:42 1745365662

> Few Thousand!?! Sound very reasonable to me. Monetize just two of those users at $35 per month and your server costs are covered

That's one way to look at it, but compared to any other way to run a server, it's objectively terrible. You can serve that many users with a $5 box.

cpursley · 2025-04-23T00:31:46 1745368306

So put Supabase on a $5 box or build out your own replacement backend if your time is that cheap.

k2xl · 2025-04-23T09:39:23 1745401163

I think i mentioned this in my original reply- the app was already built and too integrated with supabase, it is hard to leave

cpursley · 2025-04-23T11:33:35 1745408015

It's open source, self host it. Or find some paying customers.

a-l-e-c · 2025-04-23T12:25:38 1745411138

That's quite challenging to do. I've myself spent quite a few hours looking into it and came to the conclusion that they make it their goal to complicate the self-hosting by lack of detailed docs. For example: I recall seeing a comment/warning in their docs similar to "for production, don't use this default setup" and it kinda felt more like "tough luck, figure it out or fork out $ome ca$h". Perfectly fine business model, but not 100% self-hostable on a production level (even for a very basic app)

replwoacause · 2025-04-23T03:21:34 1745378494

This is exactly the reason I’ve been avoiding it despite seeing it mentioned all the time. I’m sure I’m missing out on some conveniences but it’s just too cheap to host my own pg DB. I can deal with backups and auth if it means saving a not so insignificant amount of money per month.

k2xl · 2025-04-20T17:29:27 1745170167

There is a similar issue with image and video generation. Asking the AI to "Generate an image of a man holding a pencil with his left hand" or "Generate a clock showing the time 5 minutes past 6 o'clock" often fail due to so many images in the training set being similar (almost all clock images on show 10:10 (https://generativeai.pub/in-the-ai-art-world-the-time-is-alm...)

k2xl · 2025-04-12T11:31:56 1744457516

I can’t use their video gen model veo2 since there is a waitlist… hard to tell if they are winning in video when they haven’t scaled that product.