Hacker Newsnew | past | comments | ask | show | jobs | submit | RobinL's commentslogin

Love this! Would be super interested in any details the author could share on the data engineering needed to make this work. The vis is super impressive but I suspect the data is the harder thing to get working.

The most time and energy has been getting my head around the source data [0] and industry-specific nuances.

In terms of stack I have a self-hosted Dagster [1] data pipeline that periodically dumps the data onto Cloudflare R2 as parquet files. I then have a self-hosted NodeJS API that uses DuckDB to crunch the raw data and output everything you see on the map.

[0] Mostly from https://bmrs.elexon.co.uk/ [1] https://dagster.io/


My wife's old company, a fairly significant engineering consultancy, ran it's entire time/job management and invoicing system from a company wide, custom developed Microsoft Access app called 'Time'.

It was developed by a single guy in the IT department and she liked it.

About 5 years ago the company was acquired, and they had to move to their COTS 'enterprise' system (Maconomy).

All staff from the old company had to do a week long (!) training course in how to use this and she hates it.

In future I think there will be more things like 'Time' (though presumably not MS Access based!)


> In future I think there will be more things like 'Time' (though presumably not MS Access based!)

That's my assertion - those things like 'Time' can be developed by an AI primarily because there is no requirement of an existence of a community from which to hire.

It's an example of a small ERP system - no consultants, no changes, no community, etc.

Large systems (Sage, SAP, Syspro, etc) are purchased based on the existing pool of contractors that can be hired.

Right now, if you had a competing SAP/Syspro system freshly developed, that had all the integrations that a customer needs, how on earth will they deploy it if they cannot hire people to deploy it?


Not to mention Sage's midline - Sage100 is incredibly cheap and effective for it's cost. I mean it's ridiculous what a mature software can do. Everything under the sun basically for a pittance.

It's certainly not "SAP 10 million dollar deployments". we see implementation rarely run into 6 figures for SMB distributors and manufacturing firms. That's less than most of their yearly budget for buying new fleet vehicles or equipment


I still think MS Access was awesome. In the small companies I worked it was used successfully by moderately tech savvy directors and support employees to manage ERP, license generation, invoices, etc.

The most heard gripe was the concurrent access to the database file but I think that was solved by backing the forms by accessing anything over odbc.

It looked terrible but also was highly functional.


Agreed! The first piece of software I built was a simple inventory and sales management system, around 2000. I was 16 and it was just about my first experience programming.

It was for school, and I recently found the write up and was surprised how well the system worked.

Ever since I've marvelled at how easy it was to build something highly functional that could incorporate complex business logic, and wished there was a more modern equivalent.


Maybe a combination of AirTable and PowerBI/open-source alternative? Or just ms access backed by a proper database?

Grist[1] is great for this stuff, at first glance its a spreadsheet but that spreadsheet is backed by a SQLite database and you can put an actual UI on top of it without leaving the tool, or you can write full blown plugins in Javascript and HTML if you need to go further than that.

[1] https://www.getgrist.com/


Just another yay for Grist here! I've been looking for an Access alternative for quite a while and nothing really comes close. You can try hacking it together with various BI tools, but nothing really feels as accessible as the original Access. While it's not a 1:1 mapping and the graphical report building is not really there, you can still achieve what you need. It's like Access 2.0 to me.

Access as a front end for mssqlserver ran great in a small shop. Seems like there was a wizard that imported the the access tables easily into sqlserver.

I've not seen anything as easy to use as the Access visual query builder and drag-n-drop report builder thing.


Agree. Much of the value of devs is understanding the thing they're working on so they know what to do when it breaks, and knows what new features it can easily support. Doesn't matter whether they wrote the code, a colleague wrote it, or an AI.

Yep writing the code might have gotten a little bit easier but was never was the hard part to begin with.

> saw absolutely zero value in it

At the very least, it can quickly build throwaway productivity enhancing tools.

Some examples from building a small education game: - I needed to record sound clips for a game. I vibe coded a webapp in <15 mins that had a record button, keyboard shortcuts to progress though the list of clips i needed, and outputted all the audio for over 100 separate files in the folder structure and with the file names i needed, and wrote the ffmpeg script to post process the files

- I needed json files for the path of each letter. gemini 3 converted images to json and then codex built me an interactive editor to tidy up the bits gemini go wrong by hand

The quality of the code didn't matter because all i needed was the outputs.

The final games can be found: https://www.robinlinacre.com/letter_constellations https://www.robinlinacre.com/bee_letters/ code: https://github.com/robinL/


Does anyone have direct experience with Claude making damaging mistakes in dangerously skip permissions mode? It'd be great to have a sense of what the real world risk is.


Claude is very happy to wipe remote dbs, particularly if you're using something like supabase's mcp server. Sometimes it goes down rabbitholes and tries to clean itself up with `rm -rf`.

There is definitely a real world risk. You should browse the ai coding subreddits. The regularity of `rm -rf` disasters is, sadly, a great source of entertainment for me.

I once was playing around, having Claude Code (Agent A) control another instance of Claude Code (Agent B) within a tmux session using tmux's scripting. Within that session, I messed around with Agent B to make it output text that made Agent A think Agent B rm -rf'd entire codebase. It was such a stupid "prank", but seeing Agent A's frantic and worried reaction to Agent B's mistake was the loudest and only time I've laughed because of an LLM.


Why in the hell would it be able to access a _remote_ database?! In no acceptable dev environment would someone be able to access that.


Everywhere I’ve ever worked, there was always some way to access a production system even if it required multiple approvals and short-lived credentials for something like AWS SSM. If the user has access, the agent has access, no matter how briefly.


Not if you require auth with a Yubikey, not if you run the LLM client inside a VM which doesn't have your private ssh key, ...


Supabase virtually encouraged it last year haha. I tried using it once and noped out after using it for an hour, when claude tried to do a bunch of migrations on prod instead of dev.

https://web.archive.org/web/20250622161053/https://supabase....

Now, there are some actual warnings. https://supabase.com/docs/guides/getting-started/mcp#securit...


I think LLMs are exposing how slapdash many people work when building software.


Claude has twice now thought that deleting the database is the right thing to do. It didn't matter as it was local and one created with fixtures in the Docker container (in anticipation of such a scenario), but it was an inappropriate way of handling Django migration issues.


One recent example. For some reason, recently Claude prefer to write scripts in root /tmp folder. I don't like this behavior at all. It's nothing destructive, but it should be out of scope by default. I notice they keep adding more safeguards which is great, eg asking for permissions, but it seems to be case by case.


If you're not using .claude/instructions.md yet, I highly recommend it, for moments like this one you can tell it where to shove scripts. Trickery with the instructions file is Claude only reads it during a new prompt, so any time you update it, or Claude "forgets" instructions, ask it to re-read it, usually does the trick for me.


Claude, I noticed you rm -rf my entire system. Your .instructions.md file specifically prohibits this. Please re-read your .instructions.md file and comply with it for all further work


IMHO a combination of trash CLI and a smarter shell program that prevents deleting critical paths would do it.

https://github.com/andreafrancia/trash-cli



When approving actions "for this project" I actively monitor .claude\settings.local.json

as

"Bash(az resource:)",

is much more permissive than

"Bash(az resource show:)",

It mostly gets it right but I instantly fix the file with the "readonly" version when it gets it too open.


I caught Claude using docker (running as root) to access files on my machine it couldn't read using it's user.


It feels like most people are exposing how wild west their environments are.


I think that's totally fine for individual work, but in larger data engineering teams it's less good to switch between tools because other people may have to maintain your code.

That said, polars is good, and if the team agree to standardise on it then that's a totally reasonable choice.

I guess one of my reservations is I've been (historically) burned by decisions within data eng teams to use pandas, causing all sorts of problems with data typing and memory and eventually having to rewrite it all. But I accept polars doesn't suffer from the same problems (and actually some of them are even mitigated in more recent versions of pandas)


Worse in some ways, better in others. DuckDB is often an excellent tool for this kind of task. Since it can run parallelized reads I imagine it's often faster than command line tool, and with easier to understand syntax


More importantly, you have your data in a structured format that can be easily inspected at any stage of the pipeline using a familiar tool: SQL.

I've been using this pattern (scripts or code that execute commands against DuckDB) to process data more recently, and the ability to do deep investigations on the data as you're designing the pipeline (or when things go wrong) is very useful. Doing it with a code-based solution (read data into objects in memory) is much more challenging to view the data. Debugging tools to inspect the objects on the heap is painful compared to being able to JOIN/WHERE/GROUP BY your data.


Yep. It’s literally what SQL was designed for, your business website can running it… the you write a shell script to also pull some data on a cron. It’s beautiful


IMHO the main point of the article is that typical unix command pipeline pipeline IS parallelized already.

The bottleneck in the example was maxing out disk IO, which I don't think duckdb can help with.


Pipes are parallelized when you have unidirectional data flow between stages. They really kind of suck for fan-out and joining though. I do love a good long pipeline of do-one-thing-well utilities, but that design still has major limits. To me, the main advantage of pipelines is not so much the parallelism, but being streams that process "lazily".

On the other hand, unix sockets combined with socat can perform some real wizardry, but I never quite got the hang of that style.


Pipelines are indeed one flow, and that works most of the time, but shell scripts make parallel tasks easy too. The shell provides tools to spawn subshells in the background and wait for their completion. Then there are utilities like xargs -P and make -j.


UNIX provides the Makefile as go-to tool if a simple pipeline is not enough. GNUmake makes this even more powerful by being able to generate rules on-the-fly.

If the tool of interest works with files (like the UNIX tools do) it fits very well.

If the tool doesn't work with single files I have had some success in using Makefiles for generic processing tasks by creating a marker file that a given task was complete as part of the target.


Found myself nodding along. I think increasingly it's useful to think of PRs from unknown external people as more like an issue than a PR (kind of like the 'issue first' policy described in the article).

There's actually something very valuable about a user specifying what they want using a working solution, even if the code is not mergeable.


Exactly - these huge machines are surely eating a lot into the need for distributed systems like Spark. So much less of a headache to run as well


Author here. I wouldn't argue SQL or duckdb is _more_ testable than polars. But I think historically people have criticised SQL as being hard to test. Duckdb changes that.

I disagree that SQL has nothing to do with fast. One of the most amazing things to me about SQL is that, since it's declarative, the same code has got faster and faster to execute as we've gone through better and better SQL engines. I've seen this through the past five years of writing and maintaining a record linkage library. It generates SQL that can be executed against multiple backends. My library gets faster and faster year after year without me having to do anything, due to improvements in the SQL backends that handle things like vectorisation and parallelization for me. I imagine if I were to try and program the routines by hand, it would be significantly slower since so much work has gone into optimising SQL engines.

In terms of future proof - yes in the sense that the code will still be easy to run in 20 years time.


> I disagree that SQL has nothing to do with fast. One of the most amazing things to me about SQL is that, since it's declarative, the same code has got faster and faster to execute as we've gone through better and better SQL engines.

Yeah, but SQL isn't really portable between query all query engines. You always have to be speaking the same dialect. Also, SQL isn't the only "declarative" dsl, polars's lazyframe api is similarly declarative. Technically Ibis's dataframe dsl also works as a multi-frontend declarative query language. Or even substrait.

Anways my point is that SQL is not inherently a faster paradigm than "dataframes", but that you're conflating declarative query planning with SQL.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: