Hacker Newsnew | past | comments | ask | show | jobs | submit | more unoti's commentslogin

> Why can’t the LLM just learn your writing style from your previous emails to that person?

It totally could. For one thing you could fine tune the model, but I don't think I'd recommend that. For this specific use case, imagine an addition to the prompt that says """To help you with additional context and writing style, here snippets of recent emails Pete wrote to {recipient}: --- {recent_email_snippets} """


> LLMs lie about their reasoning

People do this all the time too! Cat scans show that people make up their minds quickly, showing activations in one part of the brain that makes snap judgements, and then a fraction of a second later the part that shows rational reasoning begins to activate. People in sales have long known this, wanting to give people emotional reasons to make the right decision, while also giving them the rational data needed to support it. [1]

I remember seeing this illustrated ourselves when our team of 8 or so people was making a big ERP purchasing decision between Oracle ERP and Peoplesoft long ago. We had divided what our application needed to do into over 400 feature areas, and in each feature area had developed a very structured set of evaluation criteria for each area. Then we put weights on each of those to express how important it was to us. We had a big spreadsheet to rank the things.

But along the way of the 9 month sales process, we really enjoyed working with the Oracle sales team a lot better. We felt like we'd be able to work with them better. In the end, we ran all the numbers, and Peoplesoft came out on top. And we sat there and soberly looked each other in the eyes, and said "We're going with Oracle." (Actually I remember one lady on the team when asked for her vote said, "It's gotta be the big O.")

Salespeople know that ultimately it's a gut decision, even if the people buying things don't realize that themselves.

[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC6310859/


> People do this all the time too

I wish people would stop comparing AI to Humans, honestly

I know humans are flawed. We all know

The appeal of computer systems is that they are consistent. The ideal software is bug free, zero flaws

Creating human-like computer systems is so worthless. Why would we want to make them less predictable and less consistent


Language models happen to share human flaws, but like humans they can amplify their abilities and reliability by building and using reliable tools.


I actually prefer a system that's correct half of the time at thousands of times the cost & speed.


The real answer is it's completely domain-specific. If you're trying to search for something that you'll instantly know when you see it, then something that can instantly give you 5 wrong answers and 1 right answer is a godsend and barely worse than something that is right 100% of the time. If the task is to be an authoritative designer of a new aeroplane, it's a different story.


Because we can still do things computers can't and that's interesting


When using AI, you are still the one responsible for the code. If the AI writes code and you don't read every line, why did it make its way into a commit? If you don't understand every line it wrote, what are you doing? If you don't actually love every line it wrote, why didn't you make it rewrite it with some guidance or rewrite it yourself?

The situation described in the article is similar to having junior developers we don't trust committing code and us releasing it to production and blaming the failure on them.

If a junior on the team does something dumb and causes a big failure, I wonder where the senior engineers and managers were during that situation. We closely supervise and direct the work of those people until they've built the skills and ways of thinking needed to be ready for that kind of autonomy. There are reasons we have multiple developers of varying levels of seniority: trust.

We build relationships with people, and that is why we extend them the trust. We don't extend trust to people until they have demonstrated they are worthy of that trust over a period of time. At the heart of relationships is that we talk to each other and listen to each other, grow and learn about each other, are coachable, get onto the same page with each other. Although there are ways to coach llm's and fine tune them, LLM's don't do nearly as good of a job at this kind of growth and trust building as humans do. LLM's are super useful and absolutely should be worked into the engineering workflow, but they don't deserve the kind of trust that some people erroneously give to them.

You still have to care deeply about your software. If this story talked about inexperienced junior engineers messing up codebases, I'd be wondering where the senior engineers and leadership were in allowing that to mess things up. A huge part of engineering is all about building reliable systems out of unreliable components and always has been. To me this story points to process improvement gaps and ways of thinking people need to change more than it points to the weak points of AI.


The pace differs though. A junior would need a week for a feature an LLM can produce in an hour. And you’re expected to validate that just as quickly. And LLMs are trained to appeal to the reader, unlike an average junior dev. Devs will only get lazy the more they rely on LLMs. It’s like you’re at the university and there’s no homework anymore, just lectures. You’re just passively ingesting data, not getting trained on real problems because you’ve got AI to do that for you. So you’re no longer challenged to grow anymore in your domain. What’s left are hard problems that the AI will mislead you on because it’s unfamiliar with them, and your opportunity to learn was lost to delegating to AI. In the end the pressure will grow at work, more features will be expected in shorter time frames. You’ll get even less time to learn and grow as a developer or engineer.


I hear you. But consider this: substitute “LLM” in what you said above with “coworker” or “direct report”.

Does having a coworker automatically make a person dumb and no longer willing or able to grow? Does an engineer who becomes a manager instantly lose their ability to work or grow or learn? Sometimes, yes I know, but it’s not a foregone conclusion.

Agents are a new tool in our arsenal and we get to choose how we use them and what it will do for us, and what it will do to us, each as individuals.


You can substitute anything like “banana” or “parrot” for that and ask the same question.

Change of roles is a twist I didn’t suggest, it’s not related to my argument. I was talking about an engineering role. I’m not seeing an analogy with what you’re suggesting. Even less so does your suggested “immediately” resonate with me. Such transitions are rarely is immediate. Growth on an alternative career path is a different story.

The problem that I see here is that we’re not given that choice you’re considering. Take for example the recent Shopify pivot. It is now expected by the management because they believe the exaggerated hype, especially motivated during the ongoing financing crunch - in many places. So it’s not a lawnmower we’re talking about here but an oracle one would need to be capable of challenging.


I came here looking for information about why Ericsson stopped using Erlang, and for more information about Joe's firing.

The short answer seems to be that they pivoted to Java for new projects, which marginalized Erlang. Then Joe and colleagues formed Bluetail in 1998. They were bought by Nortel. Nortel was a telecom giant forming about a third of the value of the Toronto Stock Exchange. In 2000 Nortel's stock reached $125 per share, but by 2002 the stock had gone down to less than $1. This was all part of the dot com crash, and Nortel was hit particularly hard because of the dot com bubble burst corresponding with a big downturn in telecom spending.

It seems safe to look at Joe's layoff as more of a "his unit was the first to slip beneath the waves on a sinking ship" situation, as they laid off 60,000 employees or more than two thirds of their workforce. The layoff was not a sign that he may not have been pulling his weight. It was part of a big move of desperation not to be taken as a sign of the ineffectiveness of that business unit.


It's very weird to me to see the word "fired" in this context. "Laid off" is more appropriate. "Fired" is very value-laden and implies fault and termination with cause. Which I'm sure if that was somehow actually true the original article author would know nothing about, nor would it be any of their business.


> I never used much IBM hardware, and always assumed that their terminals used EBCDIC and not ASCII.

Generally this is true. But around this era I used a ton of 3151 terminals on AIX, IBM's version of Unix. They were connected to the RS/6000 line of AIX machines. Good times! These machines, as Unix machines, talked in ASCII to their terminals. There was a whole line of port concentrators which would like you connect something like 32 ASCII terminals to a little block about the size of a modern ethernet router, and you could connect 4 of so of these blocks to a device in the main machine.


Good info. I've also used AIX, but more recently (after 1995). By then, I think EBCDIC had lost the standards battle.


> you can work on your problem, or you can customize the language to fit your problem better

There’s a thing I’m whispering to myself constantly as I work on software: “if I had something that would make this easy, what would it look like?”

I do this continuously, whether I’m working in C++ or Python. Although the author was talking about Lisp here, the approach should be applied to any language. Split the problem up into an abstraction that makes it look easy. Then dive in and make that abstraction, and ask yourself again what you’d need to make this level easy, and repeat.

Sometimes it takes a lot of work to make some of those parts look and be easy.

In the end, the whole thing looks easy, and your reward is someone auditing the code and saying that you work on a code base of moderate complexity and they’re not sure if you’re capable enough to do anything that isn’t simple. But that’s the way it is sometimes.


Yes! I call this sort of top-down programming "wishful thinking." It is these days much easier to explain to people, because machine learning tools.

“if you can just trust that chat GPT will later fill in whatever stub functions you write, how would you write this program?” — and you can quickly get going, “well, I guess I would have a queue, while the queue is not empty I pull an item from there, look up its responsible party in LDAP, I guess I need to memoize my LDAP queries so let's @cache that LDAP stub, if that party is authorized we just log the access to our S3-document, oh yeah I need an S3-document I am building up... otherwise we log AND we add the following new events to the queue...”

It is not the technique that has most enhanced what I write, which is probably a variant on functional core imperative shell. But it's pretty solid as a way to break that writers block that you face in any new app.


"Wishful thinking" is exactly what it's called in SICP. You write the code to solve your problem using the abstraction you want, then you implement the abstraction.


I want to hear more about this functional core/imperative shell....


It's by Gary Bernhardt: https://www.destroyallsoftware.com/screencasts/catalog/funct...

He also did another talk expanding the concept called Boundaries: https://www.destroyallsoftware.com/talks/boundaries


Satvik gave you a fine link in a sibling comment, but I like to add something that I call "shell services" so let me give maybe the smallest example I've got of what it looks like, this is a little Python lambda. First you have a core module, which only holds data structures and deterministic transforms between them:

    core/app_config.py   (data structures to configure services)
    core/events.py       (defines the core Event data structure and such)
    core/grouping.py     (parses rule files for grouping Events to send)
    core/parse_events.py (registers a bunch of parsers for events from different sources)
    core/users.py        (defines the core user data structures)
(there's also an __init__.py to mark it as a module, and so forth). There is some subtlety, for instance events.py contains the logic to turn an event into a slack message string or an email, an AppConfig contains the definition of what groups there are and whether they should send an email or a slack message or both. But everything here is a deterministic transform. So for instance `parse_event` doesn't yet know what User to associate an event with, so `user.py` defines a `UserRef` that might be looked up to figure out more about a user, and there is a distinction between an `EventWithRef` which contains a `list[UserRef]` list of user-refs to try, and an `Event` which contains a User.

Then there's the services/ module, which is for interactions with external systems. These are intentionally as bare as possible:

    services/audit_db.py      (saves events to a DB to dedupe them)
    services/config.py        (reads live config params from an AWS environment)
    services/notification.py  (sends emails and slack messages)
    services/user_lookup.py   (user queries like LDAP to look up UserRefs)
If they need to hold onto a connection, like `user_lookup` holds an LDAP connection and `audit_db` holds a database connecction, then these are classes where __init__ takes some subset of an AppConfig to configure itself. Otherwise like for the email/slack sends in the notification service, these are just functions which take part of the AppConfig as a parameter.

These functions are as simple as possible. There are a couple audit_db functions which perform -gasp- TWO database queries, but it's for a good reason (e.g. lambdas can be running in parallel so I want to atomically UPDATE some rows as "mine" before I SELECT them for processing notifications to send). They take core data structures as inputs and generate core data structures as outputs and usually I've arranged for some core data structure to "perfectly match" what the service produces (Python TypedDict is handy for this in JSON-land).

"Simple" can be defined approximately as "having if statements", you can say that basically all if/then logic should be moved to the functional core. This requires a bit of care because for instance a UserRef contains an enum (a UserRefType) and user_lookup will switch() on this to determine which lookup it should perform, should I ask LDAP about an email address, should I ask it about an Amazon user ID, should I ask this other non-LDAP system. I don't consider that sort of switch statement to be if/then complexity. So the rule of thumb is that the decision of what lookups to do, is made in Core code, and then actually doing one is performed from the UserLookupService.

If you grok type theory, the idea more briefly is, "you shouldn't have if/then/else here, but you CAN have try/catch and you CAN accept a sum type as your argument and handle each case of the sum type slightly differently."

Finally there's the parent structure,

    main.py        (main entrypoint for the lambda)
    migrator.py    (a quick DB migration script)
    ../sql/        (some migrations to run)
    ../test/       (some tests to run)
Here's the deal, main.py is like 100 lines long gluing the core to the services. So if you printed it it's only three pages of reading and then you know "oh, this lambda gets an AppConfig from the config service, initializes some other services with that, does the database migrations, and then after all that setup is done, it proceeds in two phases. In the first ingestion phase it parses its event arguments to EventWithRefs, then looks up the list of user refs to a User and then makes an Event with it: then it labels those events with their groups, it checks those groups for an allowlist and drops some events based on the allowlist, otherwise it inserts those events into the database, skipping duplicates. Once all of that ingestion is done, phase two of reporting starts, it reserves any unreported records in the database, groups them by their groups, and for each group, tells the notification service, "here's a bunch of notifications to send" and for each successful send, we mark all of the events we were processing as reported. Last we purge any records older than our retention policy and we close the database connection." You get the story in broad overview in three pages of readable code. Migrator.py adds about two printed pages more to do database migrations, in its current form it makes its own DB connections from strings so it doesn't depend on core/ or services/, it's kind of an "init container" app except AWS Lambda isn't containerized in that way.

The test folder is maybe the most important part, because based on this decoupling,

- The little pieces of logic that haven't been moved out of main.py yet, can be tested by mocking. This can be reduced arbitrarily much -- in theory there is no reason that a Functional Core Imperative Shell program needs mocks. (Without mocks, the assurance that main.py works is that main.py looks like it works and worked previously and hasn't changed, it's pure glue and high-level architecture. If it does need to change, the assurance that it works is that it was deployed to dev and worked fine there, so the overall architecture should be OK.)

- The DB migrations can be tested locally by spinning up a DB with some example data in it, and running migrations on it.

- The core folder can be tested exhaustively by local unit tests. This is why it's all deterministic transforms between data structures -- that's actually, if you like, what mocking is, it's an attempt to take nondeterministic code and make it deterministic. The functional core, is where all the business logic is, and because it's all deterministic it can all be tested without mocking.

- The services, can be tested pretty well by nonlocal unit "smoke"/"integration" tests, which just connect and verify that "if you send X and parse the response to data structure Y, no exception gets thrown and Y has some properties we expect etc." This doesn't fully test the situations where the external libraries called by services, throw exceptions that aren't caught. So like you can easily test "remote exists" and "remote doesn't exist" but "remote stops existing halfway through" is untested and "remote times out" is tricky.

- The choice to test stuff in services, depends a lot on who has control over it. AuditDBService is always tested against another local DB in a docker container with test data preloaded, because we control schema, we control data, it's just a hotspot for devs to modify. config.py's `def config_from_secrets_manager()` is always run against the AWS Secrets Manager in dev. UserLookupService is always tested against live LDAP because that's on the VPN and we have easy access to it. But like NotificationService, while it probably should get some sort of API token and send to real Slack API, we haven't put in the infrastructure for that and created a test Slack channel or whatever... so it's basically untested (we mock the HTTP requests library, I think?). But it's also something that nobody has basically ever had to change.

Once you see that you can just exhaustively test everything in core/ it becomes really addictive to structure everything this way. "What's the least amount of stuff I can put into this shell service, how can I move all of its decisions to the functional core, oh crap do I need a generic type to hold either a User or list[UserRef] possible lookups?" etc.


“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”

George Bernard Shaw, Man and Superman


C++ is for when complexity cost is worth trading for performance gain. What type of person successfully finds simplicity working in C++?


> C++ is for when complexity cost is worth trading for performance gain. What type of person successfully finds simplicity working in C++?

> What type of person successfully finds simplicity working in C++?

Be the change you want to see!

Every language has the raw materials available to turn the codebase into an inscrutable complex mess, C++ more than others. But it’s still possible to make it make sense with a principled approach.


Those of us that grew up with the language, always used it instead of C when given the option, thus even if no one can claim expertise, we know it well enough to find it relatively simple to work with.

In the some vein that Python looks simple on the surface, but in reality it is a quite deep language, when people move beyond using it as DSL for C/Fortran libraries, or introduction to programming scenarios.


I think Casey Muratori calls this "compression". And yeah (see other child comment), he does it in C++ :-)


Creating a Domain Specific Language (DSL) for a given task is a classic approach to solving problems.


Regarding stamina defined broadly as it is in this article: One of my favorite quotes comes from an unlikely source: Mike Tyson.

"I don't care how good you are at anything. You don't have discipline you ain’t nothin. Discipline is doing what you hate but doing it like you love it"

[1] Mike Tyson Fighter's Coldest Quotes Of All Time https://www.youtube.com/watch?v=fz9JUENem78&t=100s


I noticed 4o mini didn't follow the directions to quote users. My favourite part of the 4.5 summary was how it quoted Antirez. 4o mini brought out the same quote, but failed to attribute it as instructed.


It's fascinating, but while this does mean it strays from the given example, I actually feel the result is a better summary. The 4.5 version is so long you might just read the whole thread yourself.


What a century it's been...


> Thing that I don't understand about LLMs at all, is that how it is possible to for it to "understand" and reply in hex (or any other encoding), if it is a statistical "machine"

It develops understanding because that's the best way for it to succeed at what it was trained to do. Yes, it's predicting the next token, but it's using its learned understanding of the world to do it. So this it's not terribly surprising if you acknowledge the possibility of real understanding by the machine.

As an aside, even GPT3 was able to do things like english -> french -> base64. So I'd ask a question, and ask it to translate its answer to french, and then base64 encode that. I figured there's like zero chance that this existed in the training data. I've also base64 encoded a question in spanish and asked it, in the base64 prompt, to respond in base64 encoded french. It's pretty smart and has a reasonable understanding of what it's talking about.


This comment will be very down voted. It is statistically likely to invoke an emotional response.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: