That particular section I have to Wikipedia article seems to have gone through a bunch of anonymous edits back and forth around the content of this citation
She was known as being _extremely talented_ at software development, particularly her knowledge of low level hardware and how to optimize around constraints.
I'm saying you can keep track of all the riders and drivers, matchmake, start/progress/complete trips, with a single server, for the entire world.
Billing, serving assets like map tiles, etc. not included.
Some key things to understand:
* The scale of Uber is not that high. A big city surely has < 10,000 drivers simultaneously, probably less than 1,000.
* The driver and rider phones participate in the state keeping. They send updates every 4 seconds, but they only have to be online to start a trip. Both mobiles cache a trip log that gets uploaded when network is available.
* Since driver/rider send updates every 4 seconds, and since you don't need to be online to continue or end a trip, you don't even need an active spare for the server. A hot spare can rebuild the world state in 4 seconds. State for a rider and driver is just a few bytes each for id, position and status.
* Since you'll have the rider and driver trip logs from their phones, you don't necessarily have to log the ride server side either. Its also OK to lose a little data on the server. You can use UDP.
Don't forget that in the olden times, all the taxis in a city like New York were dispatched by humans. All the police in the city were dispatched by humans. You can replace a building of dispatchers with a good server and mobile hardware working together.
You could envision a system that used one server per county and that’s 3k servers. Combine rural counties to get that down to 1000, and that’s probably less servers than uber runs.
What the internet will tell me is that uber has 4500 distinct services, which is more services than there are counties in the US.
The reality is that, no, that is not possible. If a single core can render and return a web page in 16ms, what do you do when you have a million requests/sec?
The reality is most of those requests (now) get mixed in with a firehose of traffic, and could be served much faster than 16ms if that is all that was going on. But it’s never all that is going on.
This looks super interesting for single-AZ systems (which are useful, and have their place).
But I can't find anything to support the use case for highly available (multi-AZ), scalable, production infrastructure. Specifically, a unified and consistent cache across geos (AZs in the AWS case, since this seems to be targeted at S3).
Without it, you're increasing costs somewhere in your organization - cross-AZ networking costs, increased cache sizes in each AZ to be available, increased compute and cache coherency costs across AZs to ensure the caches are always in sync, etc etc.
Any insight from the authors on how they handle these issue on their production systems at scale?
Not the author but. Its a user side read through cache, so no need for pre-emptive cache coherence as such. But there will be a performance penalty for fetching data under write contention irrespective of whether you have single az/multiple AZ. The only way to mitigate the performance penalty here is to have accurate predictive fetching which works for usage patterns.
Assuming the "Designed for caching immutable blobs", I guess the approach is to indeed increase the cache size in each AZ or eat the cross-AZ networking costs.
Thank you for your comment and we love your take on this! Coming to what we think about John Carmack's arguments against building a custom XR OS at Meta,
It's kinda hard to have a single answer for this but we like to think that this project did not work out for META particularly. We believe that it was perhaps not the right project for someone like META who already had other on-going projects to deal with. Although a setback, it doesn't necessarily mean there's a fault in the idea itself. We'd like to address the problems that were mentioned with the development of a custom XROS -
1) Cost : We've managed to do everything up until this point at the cost of 0$. However, we're in initial stages and expect to incur costs in the next few months. But those costs are mostly associated with building the company and our own line of products, not in the actual development and engineering of the OS. We've managed to remain extremely cost efficient and intend to continue that practice.
2) Burden on third-party/new developers : We anticipated this problem right when we started building the OS. Over time, we've come up with plans to encounter it and are currently working on making sure that the barrier to enter and create applications and software on Xeneva remains as non-existent as possible. We intend on making the process easy, try to bring no learning curve whatsoever, thus allowing developers to port their existing software and applications to XenevaOS conveniently. Also create a beginner friendly environment so that new programmers are able to create an app on XenevaOS from scratch easily.
The whole point is for it to cost less (ie, smaller size) for the sender and cost more (ie, larger size) for the receiver.
The compression ratio is the whole point... if you can send something small for next to no $$ which causes the receiver to crash due to RAM, storage, compute, etc constraints, you win.
Yep, I love Apple, follow them closely, own a Mac Studio with an M3 Ultra and a MacBook Pro with an M4 Max, and it's still confusing. :)
I mean, surely a Mac Studio with an M4 Max must be the best, right? It's an entire CPU generation ahead and it's maximum! Of course, it's not... the M3 Ultra is the best.
While I get that there are use cases for physical media, as both a data hoarder and data paranoiac (good things in my line of work), I've moved on. It's the data that matters, not the media.
To that end, I have automated two separate, but complementary, processes for my personal data:
- Sync to multiple machines, at least one of which is offsite, in near realtime. This provides rapid access to data in the event of a disk/system/building/etc failure. Use whatever software you want (Dropbox, Resilio, rsync scripts, etc), but in the event of failure, this solves 99% of my issues - I have another device that has fast access to my most recent data, current to within seconds. This is especially important when bringing up a new, upgraded system - just sync over the LAN. (Currently this is 4 devices, 2 offsite, but it flexes up/down over time occassionally).
- Backup to multiple cloud providers on a regular cadence (I do hourly encrypted incrementals). This protects me against data loss, corruption, malware attacks, my stupidity deleting something, etc. This solves the remaining 1% of my issues, enabling point-in-time recovery for any bit of data in my past, stretching back many years. I've outsourced the "media" issue to the cloud providers in this case, so they handle whatever is failing, and the cost is getting pretty absurdly cheap for consumers, and will continue to do so. My favorite software is Arq Backup, but there are lots of options. (Currently this is 4 discrete cloud providers, and since this is non-realtime typically, utilizes coldest storage options available).
Between these two complimentary, fully automated approaches, I no longer have to worry about the mess of media failure, RAID failures, system failures, human error, getting a new system online, cloud provider being evil, etc etc.
Are you sure about that? Many ransomware attackers do recon for some time to find the backup systems and then render those unusable during the attack. In your case your cloud credentials (with delete permissions?) must be present on your live sou ce systems, rendering the cloud backups vulnerable to your overwrite or deletion.
There are immutable options in the bigger cloud storage services but in my experience they are often unused, used incorrectly, or incompatible with tools that update backup metadata in-place.
I’ve encountered several tools/scripts mark a file file as immutable for 90 days the first time it is backed up, but not extend that date correctly on the next incremental, leaving older but still critical data vulnerable to ransomware.
I discovered recently that Microsoft OneDrive will detect a ransomeware attack and provide you with the option to restore your data to a point before the attack!
MS need to advertise this feature more, because I'd never heard of it and assumed all the files on the PC were toast!
Of course, the fact that a script on Windows can be accidentally run and then quietly encrypt all the users files in the background is another matter entirely!
Actually I think almost all malworm worms are totally automated. The attacker knows nothing about your network and backups, it just encrypts and deletes absolutely everything it has write access to.
No delete credentials present a cost issue when moving from a provider... I've accidentally left data behind after I thought I'd deleted it. Worth the risk, and learned my lesson.
You don't set the lifecycle rule at runtime. You set it at environment setup time. The credentials that put your object don't have to have the power to set lifecycle permissions.
You obviously don't put your environment setup user in your app. That would be utterly retarded.
And when you're moving providers you use your application credentials to do that? That makes no sense. This is nonsensical engineering. You'd use your environment credentials to alter the environment.
I'm not "engineering" anything - I'm just stopping a service. I close the account, or disable billing, or whatever that step requires. I don't even read the data back out or anything - just cancel. Doesn't really require "engineering".
You seem well placed to answer this one: how is cost for this resilience? Compare to the cost of the storage itself? Including the cost of migrating from solutions that are withdrawn from the market?
The cost (I assume you're talking about "my time" cost?) is unbelievably low, mostly in part due to good software. It "just works".
Specifically, Arq Backup, for example, lets you simply add/remove providers at will. It's happened multiple times, Amazon Drive changed (or went away? I forget...), Google Drive changed their Enterprise policies, etc... No big deal, I just deleted the provider and added another one. I still had plenty of working provider backups, so I wasn't worried while it took a day or two to fill the next provider. (Good argument for having 2+, I'd argue 3+ providers...)
Using notifications from your sync apps/systems/scripts/whatever is essential, of course, in case something fails... but all the good software has that built-in (including email and other notifications, not just OS, which helps for remote systems).
At this point, it's nearly idiot proof. (Good for idiots like me ;)
I meant more monetary cost. Nominally cloud storage for one unit of storage and one unit of time is perfectly "fine". Except that it adds up. More storage, indefinitely held, multiple hosts. Data which needs to be copied from one to the other which incurs costs from both. Add to this routine retrieval costs - if you "live this way". And routine test retrieval costs if you know what's good for you.
So last time I looked, unit costs were low - sure. But all-included costs were high.
Certainly some of this simply comes down to "how valuable is my data?".
Currently, given the extremely low (and dropping YoY) cost of storing cold data at rest, the essentially free cost of ingest, and the high cost of retrieving cold data which I almost never have to do, the ROI is wildly positive. For me.
And since all of these things (how many providers, which providers, which storage classes, how long to retain the data, etc) are all fine-tunable, you can basically do your own ROI math, then pick the parameters which work for you.
I get some peace of mind (in both professional and business settings) from having backup include a physically separable and 100% offline component. I like knowing an attacker would need to resort to kinetic means to completely destroy all copies of the data.
The physically separable component often lags behind the rest of the copies. It may only be independently verified on an air-gapped machine periodically. It's not the best copy, for sure.
I still take comfort in knowing attackers generally won't launch a kinetic attack.
Conceptually, though, I think my separation of "sync vs backup" and separating of discrete providers (both software and supplier) accomplishes this same goal. Conceptually, it's not very different, or possibly just a level up, from "online media vs archive media". At least, it seems that way to me.
Mr Metorite can launch such an attack but as long as you have two physically gapped backups at a distance greater than the likely blast radius you'll be fine.
Backups and archival are different things, with similar requirements, but different priorities. A backup doesn't care about data you think you'll never need again.
I've deployed code with Claude that was >100X faster, all-in (verification, testing, etc) than doing it myself. (I've been writing code professionally and successfully for >30 years).
>100X isn't typical, I'd guess typical in my case is >2X, but it's happened at large scale and saved me months of work in this single instance. (Years of work overall)
If you wirte 100x faster code you could probably automate almost all of it away as it seems to be super trivial and already resolved problems.
I also use Claude for my coding a lot, but i’m not sure about real gains if it even give me noticeable speed improvement as i’m in a loop of “waiting” and fixing bugs that LLM made still it is super useful for writing big doc strings and smaller tests maybe if i focus on some basic tasks like classical backend or frontend it’ll be more useful
FYI, I forked and improved [1] a Rust implementation that supports both table- and SIMD-accelerated CRC-64/NVME [2] calculations. The SIMD-accelerated (x86/x86_64 and aarch64) version delivers 10X over the table (16-bytes at a time) implementation.
The original implementation [3] did the same thing but for CRC-64/XZ [4].
This is computing CRC-64, not CRC-32, so there's not really a comparison. But perhaps most importantly, ours works with a variety of polynomials (there are a lot! [1])... we're just using the NVME one, but it's trivially adaptable to most (all?) of them. (The Intel instruction you link to only works with two - CRC32 and CRC32C)
Finally, it's based on Intel's paper [2], so they also believe it's extremely fast. :)
reply