Hacker Newsnew | past | comments | ask | show | jobs | submit | sixdimensional's commentslogin

I have a bunch of TI-99 hardware in storage, have been thinking to donate it to a computer museum potentially. I had one in my hand when I was 5 thanks to my grandpa (it made me what I am today!).

Anyone up for a rousing game of Pole Position?


I still don't understand what happened to using Apache Avro [1] for row-oriented fast write use cases.

I think by now a lot of people know you can write to Avro and compact to Parquet, and that is a key area of development. I'm not sure of a great solution yet.

Apache Iceberg tables can sit on top of Avro files as one of the storage engines/formats, in addition to Parquet or even the old ORC format.

Apache Hudi[2] was looking into HTAP capabilities - writing in row store, and compacting or merge on read into column store in the background so you can get the best of both worlds. I don't know where they've ended up.

[1] https://avro.apache.org/

[2] https://hudi.apache.org/


I concur.

Also, it was paid for by US taxpayer dollars - the entire content should have been released somewhere for free, maybe even someone would have started up a new project to maintain it, for example, something under Wikimedia or some other nonprofit.

This wholesale elimination of valuable information and data owned by the public is so incredibly sad and damaging to our future.

Maybe we need a FOIA request to get the entire contents released to the public.


It seems to be archived on the wayback machine, for example https://web.archive.org/web/20260203163430/https://www.cia.g...

It was available for online browsing or as a downloadable file, I think a zip compressed PDF. I’m sure copies are available, but it would be nice to have an authoritative source.


As far as I can tell the single zip downloadable versions stopped being published after 2020. I grabbed a copy of the 2020 zip from the Internet Archive and turned it into a GitHub repo here: https://github.com/simonw/cia-world-factbook-2020/


Just in case anyone else wants to poke around and discovers there appears to be archived versions after 2020[1]... don't bother. They all 404. At a guess: There were links to them in anticipation of creating updated zip files but they never got around to it. Lame.

[1] https://web.archive.org/web/*/https://www.cia.gov/the-world-...


> Maybe we need a FOIA request to get the entire contents released to the public.

That’s a sound idea.


If enough people FOIA them maybe they'll decide it's cheaper to just put the archived website back up!


Maybe the next president will do that. I don't think this one will.



Agreed. Though perhaps they will open source some stuff. What would interest me is HOW they got the information they showed.


It was all released into the public domain already. If you can obtain a copy it's yours to do what you like with.


Every country puts out an official gazette with abundant regulatory and statistical information. Of course you'd be foolish to rely on all these at face value, but it's an excellent starting point for assessing the economic activity of any given country. You can then synthesize it with things like market data and publicly available shipping information. Plus the CIA has (at least I hope it still has) a large staff of people whose only job is to study print, broadcast, and electronic media about other countries and compile that into regular reports of What Goes On There.

Obviously there's all sorts of covert information gathering that also goes on, but presumably the product of that is classified by default. Fortunately our executive branch is headed by intellectual types who enjoy reading and synthesizing a wealth of complex detail /s


Half serious - but is that really so different than many apps written by humans?

I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.

In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.

In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).

I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.

I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.


I would enjoy discussion with whoever voted this down - why did you?

What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?


I like Fastmail with my own domain for personal email, but the reality is nothing is a complete replacement for a Google account, given how tied in it is with auth and the whole Google ecosystem. I still have to use Google for work.

Proton is another one people often suggest. Hey.com sometimes too. No experience with those myself.

There are other options (such as the big guys, iCloud mail or Outlook.com), but aside from self-hosting (which I don't want to spend time maintaining just for my personal mail), I personally haven't seen much outside of those ones that are recommended often.


I do.

Not selling anything, but I am trying to figure out what to do to help support solo and micro entrepreneurs, very small businesses (2-3 people) and very small nonprofits.

I feel like there are a lot more people in this position now (me included), but I don't want to do things for the sake of doing them... I want to find out what solo folks really benefit from and help make sure you get more support.


Have you tried the thought experiment though?

I agree this way seems "wrong", but try putting on your engineering hat and ask what would you change to make it right?

I think that is a very interesting thread to tug on.


Not grandfather, but this is "wrong" because it's like asking a junior coder to store/read some values in the database manually (each time writing an SQL query) and then writing HTML to output those values. Each time the junior coder has to do some thinking and looking up. And the AI is doing a similar thing (using the word "thinking" loosely here).

If the coder is smart, she'll write down the query and note where to put the values, she'll have a checklist of how to load the UI for the database, paste the query, hit run, and copy/paste the output to her HTML. She'll use a standard HTML template. Later she could glue these steps up with some code so that a program takes those values, put them in the SQL query, and then put them in the HTML and send that HTML to the browser... Oh look, she's made a program, a tool! And if she gets an assignment to read/write some values, she can do it in 1 minute instead of 5. Wow, custom made programs save time, who could've guessed?


Thank you for your response!

I agree that spending time on inference or compute every time for the same LLM task is wasteful and the results would be less desirable.

But I don't think the thought experiment should end with that. We can continue to engineer and problem solve the shortcomings of the approach, IMHO.

You provided a good example of an optimization - tool creation.

Trying to keep my mind maximially open - one could think of a "design time" performance at runtime - where the user interacting with the system is describing what they want the first time, and the system is assembling the tool (much like we do now with AI assisted coding, but perhaps without even seeing the code).

Once that piece of the system is working it is persisted so no more inference is required, as essentially code - a tool, that saves time. I am thinking of this as essentially memoizing a function body- i.e. generating and persisting the code.

There could even be some process overseeing the generated code/tool to make sure the quality meets some standard and providing automated iteration, testing, etc if needed.

A big problem is if the LLM never converges to the "right" solution on it's on (e.g. the right tool to generate the HTML from the SQL query, without any hallucination). But, I am willing to momentarily punt on that problem as being more to do with the determinism problem and the quality of the result. The issue isn't per se the non-deterministic results of an LLM anyway, it's the quality of the result fit for purpose for the use case.

I think it's difficult but possible to go further with the thought experiment. A system that "builds itself" at runtime, but persists what it builds, based on user interaction and prompting when the result is satisfactory...

I remember one of the first computer science things I learned- the program that could print out it's own source code. Even then we were believing that systems could build themselves and grow themselves.

So my ask would be to look beyond the initial challenge of the first time costs of generating the tool/code and solve that by persisting a suitable result.

What challenge or problem comes next in this idea?


Running inference for every interaction seems a bit wasteful IMO, especially with a chance for things to go wrong. I’m not smart enough to come up with a way on how to optimize a repetitive operation though.


I totally agree. The reason I asked before offering any solution ideas was I was curious what you might think.

My brain went to the concept of memoization that we use to speed up function calls for common cases.

If you had a proxy that sat in front of the LLM and cached deterministic responses for inputs, with some way to maybe even give feedback when a response is satisfactory.. this could be a building block for a runtime design mode or something like that.


I think your last comment hints at the possibility- runtime generated and persisted code... e.g. the first time you call a function that doesn't exist, it persists if it fulfills the requirement... and so the next time you just call the materialized function.

Of course the generated code might not work in all cases or scenarios, or may have to be generated multiple times, and yes it would be slower the first time.. but subsequent invocation would just be the code that was generated.

I'm trying to imagine what this looks like practically.. it's a system that writes itself as you use it? I feel like there is a thread to tug on there actually.


This brings a whole new meaning to "memoizing", if we just let the LLM be a function.

In fact, this thought has been percolating in the back of my mind but I don't know how to process it:

If LLMs were perfectly deterministic - e.g. for the same input we get the same output - and we actually started memoizing results for input sets by materializing them - what would that start to resemble?

I feel as though such a thing might start to resemble the source information the model was trained on. The fact that the model compresses all the possibilities into a limited space is exactly what makes it more valuable - instead of having to store every input, function body and outputs by memoizing that an LLM could generate, it just stores the model.

But this blows my mind somehow because if we DID store all the "working" pathways, what would that knowledgebase effectively represent and how would intellectual property work anymore in that case?

Thinking about functional programming, to me the potential to think of the LLM as the "anything" function, where a deterministic seed and input always produces the same output, with a knowledgebase of pregenererated outputs to use to speed up the retrieval of acceptable results for a given seed and set of inputs.... I can't put my finger on it.. is it a basically just a search engine then?

Let me try another way...

If I have a ask an LLM to generate a function for "what color is the fruit @fruit?", where fruit is the variable, and I memoize that @fruit = banana + seed 3 is "yellow", then the set of the prompt, input "@fruit", seed = 3, output = "yellow"... then this is now a fact that I could just memoize.

Would that be faster to retrieve the memoized result than calculating the result via the LLM?

And, what do we do with the thought that that set of information is "always true" with regards to intellectual property?

I honestly don't know yet.


Maybe something here might help with masonry stuff: https://css-tricks.com/piecing-together-approaches-for-a-css...

I stumbled across it looking for CSS flex masonry examples.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: