AWS has a more powerful abstraction already, where you can condition permissions such that they are only granted when the request comes from a certain VPC or IP address (i.e. VPN exit). Malware thus exfiltrated real credentials, but they'll be worthless.
I'm not prepared to say which abstraction is more powerful but I do think it's pretty funny to stack a non-exfiltratable credential up against AWS given how the IMDS works. IMDS was the motivation for machine-locked tokens for us.
There are two separate concerns here: who the credentials are associated with, and where the credentials are used. IMDS's original security flaw was that it only covered "who" the credentials were issued to (the VM) and not where they were used, but aforementioned IAM conditions now ensure that they are indeed used within the same VPC. If a separate proxy is setup to inject credentials, then while this may cover the "where" concern, care must still be taken on the "who" concern, i.e. to ensure that the proxy does not fall to confused deputy attacks arising from multiple sandboxed agents attempting to use the same proxy.
Google's entire (initial) claim-to-fame was "PageRank", referring both to the ranking of pages and co-founder Larry Page, which strongly prioritised a relevance attribute over raw keyword findings (which then-popular alternatives such as Alta Vista, Yahoo, AskJeeves, Lycos, Infoseek, HotBot, etc., relied on, or the rather more notorious paid-rankings schemes in which SERP order was effectively sold). When it was first introduced, Google Web Search was absolutely worlds ahead of any competition. I remember this well having used them previously and adopted Google quite early (1998/99).
Even with PageRank result prioritisation is highly subject to gaming. Raw keyword search is far more so (keyword stuffing and other shenanigans), moreso as any given search engine begins to become popular and catch the attention of publishers.
Google now applies other additional ordering factors as well. And of course has come to dominate SERP results with paid, advertised, listings, which are all but impossible to discern from "organic" search results.
(I've not used Google Web Search as my primary tool for well over a decade, and probably only run a few searches per month. DDG is my primary, though I'll look at a few others including Kagi and Marginalia, though those rarely.)
PageRank was an innovative idea in the early days of the Internet when trust was high, but yes it's absolutely gamed now and I would be surprised if Google still relies on it.
Fair play to them though, it enabled them to build a massive business.
Though I'd think that you'd want to weight unaffiliated sites' anchor text to a given URL much higher than an affiliated site.
"Affiliation" is a tricky term itself. Content farms were popular in the aughts (though they seem to have largely subsided), firms such as Claria and Gator. There are chumboxes (Outbrain, Taboola), and of course affiliate links (e.g., to Amazon or other shopping sites). SEO manipulation is its own whole universe.
(I'm sure you know far more about this than I do, I'm mostly talking at other readers, and maybe hoping to glean some more wisdom from you ;-)
Oh yeah, there's definitely room for improvement in that general direction. Indexing anchor texts is much better than page rank, but in isolation, it's not sufficient.
I've also seen some benefit fingerpinting the network traffic the websites make using a headless browser, to identify which ad networks they load. Very few spam sites have no ads, since there wouldn't be any economy in that.
The full data set of DOM samples + recorded network traffic are in an enormous sqlite file (400GB+), and I haven't yet worked out any way of distributing the data yet. Though it's in the back of my mind as something I'd like to solve.
I'd also suspect that there are networks / links which are more likely signs of low-value content than others. Off the top of my head, crypto, MLM, known scam/fraud sites, and perhaps share links to certain social networks might be negative indicators.
You can actually identify clusters of websites based on the cosine similarity of their outbound links. Pretty useful for identifying content farms spanning multiple websites.
Google’s biggest search signal now is aggregate behavioral data reported from Chrome. That pervasive behavioral surveillance is the main reason Apple has never allowed a native Chrome app on iOS.
It’s also why it is so hard to compete with Google. You guys are talking about techniques for analyzing the corpus of the search index. Google does that and has a direct view into how millions of people interact with it.
Yes indeed, they have an impossibly deep moat and deeper pockets. I'm certainly not trying to compete with them with my little side project, it's just for fun!
> That pervasive behavioral surveillance is the main reason Apple has never allowed a native Chrome app on iOS.
There is a native Chrome app on iOS. It gets all the same url visit data as Chrome on other platforms.
Apple blocks 3rd party renderers and JS engines on iOS to protect its App Store from competition that might deliver software and content through other channels that they don't take a cut of.
Unfortunately this is the bulk of search engine work. Recursive scraping is easy in comparison, even with CAPTCHA bypassing. You either limit the index to only highly relevant sites (as Marginalia does) or you must work very hard to separate the spam from the ham. And spam in one search may be ham in another.
What do you mean they're not relevant? The top result you linked contained the word stackoverflow didn't it? It's showing you exactly what you searched for. Why would you need a search engine at all if you already know the name of the thing? Just type stackoverflow.com into your address bar.
I feel like Google-style "search" has made people really dumb and unable to help themselves.
the query is just to highlight that relevance is a complex topic. few people would consider "perl blog posts from 2016 that have the stack overflow tag" as the most relevant result for that query.
Confluence search does this, for our intranet. As a result it's barely usable.
Indexing is a nice compact CS problem; not completely simple for huge datasets like the entire internet, but well-formed. Ranking is the thing that makes a search engine valuable. Especially when faced with people trying to game it with SEO.
Murder has a fixed cost of human lives, which is considered (by the living) to be reprehensible at every scale.
Piracy has a negligible cost on the industry, and contributes to a positive upward pressure on IP holders to compete with low-cost access. These two crimes are not the same.
Oh, so you believe in mob rule then OK I got it. And no because there are uncensored LLM’s like menstral so it’s a you need to worry about yourself problem. Stop trying to parent me who the hell are you?
“but a large enough majority of other people feel differently.
In other words, it’s a you problem.”
Ignoring the enormous strawman, you just made, how do you know what the majority opinion is on this topic?. you don’t. You’re just arrogant because what you actually did is conducted a strap hole in your own mind of people in your echo chamber and said yeah the majority of people think my opinion is right.
that that’s called mob rule.
Next time I’ll speak slower so you can keep up that’s why it seems scattered you’re having trouble connecting the dots.
“The only thing worse than an idiot is an arrogant idiot.” you’re the dumb one here you just are too dumb to know it.
If you allocate a relatively big chunk of memory for each unit of work, and at some point your allocation fails, you can just drop that unit of work. What is not practical?
I think in that case overcommit will happily say the allocation worked. Unless you also zero the entire chunk of memory and then get OOM killed on the write.
I suppose you can try to reliable target "seriously wild allocation fails" without leaving too much memory on the table.
0: Heuristic overcommit handling. Obvious overcommits of
address space are refused. Used for a typical system. It
ensures a seriously wild allocation fails while allowing
overcommit to reduce swap usage. root is allowed to
allocate slightly more memory in this mode. This is the
default.
None of that matters: what is your application going to do if it tries to allocate 3mb of data from your 2mb allocator?
This is the far more meaningful part of the original comment:
> and furthermore most code is not in a position to do anything other than crash in an OOM scenario
Given that (unlike a language such as Zig) Rust doesn’t use a variety of different allocator types within a given system, choosing to reliably panic with a reasonable message and stack/trace is a very reasonable mindset to have.
Since we're talking about SQLite, by far the most memory it allocates is for the page cache.
If some allocation fails, the error bubbles up until a safe place, where some pages can be dropped from the cache, and the operation that failed can be tried again.
All this requires is that bubbling up this specific error condition doesn't allocate. Which SQLite purportedly tests.
I'll note that this is not entirely dissimilar to a system where an allocation that can't be immediately satisfied triggers a full garbage collection cycle before an OOM is raised (and where some data might be held through soft/weak pointers and dropped under pressure), just implemented in library code.
Sure, and this is completely sensible to do in a library.
But that’s not the point: what can most applications do when SQLite tells them that it encountered a memory error and couldn’t complete the transaction?
Abort and report an error to the user. In a CLI this would be a panic/abort, and in a service that would usually be implemented as a panic handler (which also catches other errors) that attempts to return an error response.
In this context, who cares if it’s an OOM error or another fatal exception? The outcome is the same.
Of course that’s not universal, but it covers 99% of use cases.
The topic is whether Rust should be used to re-implement SQLite.
If SQLite fails to allocate memory for a string or blob, it bubbles up the error, frees some data, and maybe tries again.
Your app may be "hopeless" if the error bubbles up all the way to it, that's your choice, but SQLite may have already handled the error internally, retried, and given your answer without you noticing.
Or it may at least have rolled back your transaction cleanly, instead of immediately crashing at the point of the failed allocation. And although crashing should not corrupt your database, a clean rollback is much faster to recover from, even if your app then decides to crash.
Your app, e.g. an HTTP server, might decide to drop the request, maybe close that SQLite connection, and stay alive to handle other ongoing and new requests.
SQLite wants to be programmed in a language were a failed allocation doesn't crash, and unlike most other code, SQLite is actually tested for how it behaves when malloc fails.
In C++ it will throw an exception which you can catch, and then gracefully report that the operation exceeded limits and/or perform some fallback.
Historically, a lot of C code fails to handle memory allocation failure properly because checking malloc etc for null result is too much work — C code tends to calm that a lot.
Bjarne Stroustrup added exceptions to C++ in part so that you could write programs that easily recover when memory allocation fails - that was the original motivation for exceptions.
In this one way, rust is a step backwards towards C. I hope that rust comes up with a better story around this, because in some applications it does matter.
In the world of confusing landing pages, this project is a piece of art: what the fuck does any of this mean
> EXCITABLE. EXACTABLE. EXECUTABLE. A shared universe of destinations and memory.
> Space is a SaaS platform presented by PromptFluid. It contains a collection of tools and toys that are released on a regular cadence.
> PromptFluid Official Briefing: You are reading from the ground. Space is above. We transmit because the colony cannot afford silence.
> You can ignore the story and still use everything. But if you want the deeper current: this relay exists because the colony is fragile, and Space is the only place the tools can grow without choking the ground.
> Creating an account creates a Star. Your Star is your identity within Space and unlocks Spacewalking capabilities and certain tools that require persistent state
Interesting read, but I feel like they should have also benchmarked using COPY with Postgres. This should be far faster than a bulk insert, and it’s more in line with what they are benchmarking.
I maintain a project that publishes a SQLite file containing all package metadata, if you don’t want to use BigQuery or the API to do this kind of analysis
reply