I like the idea but most of the servers we manage have out going firewalls to block them from talking to the internet. We produce installed package lists during deployment (as much as possible we run immutable pre-built images and replace the image rather than upgrade in place) which could be sent to a service like this but wouldn't want to start punching holes and adding routes for it. To work as is we'd need to add duplicate canary servers in an isolate environment to talk to the service.
Dang, wished known about that a week ago when I was kludging it for a couple of utilities (thinking "someone must have a library for this but I can't find it!"). Thank you.
No, there's more to us than just DNA. For example methylation, the addition of methyl chemical groups to some bases, which isn't tracked in "normal" DNA sequencing controls which genes get expressed by which cells. Plus there's a reasonable chance of errors in the sequencing due to the need to copy the DNA repeatedly to identify.
- Data size
Most bioinformatics data formats are plain ASCII. So even the reference data would be 3Gb per person. But 1,000 genomes contains sequencing reads where each DNA bases is sample multiple times (20-40 is typical "read depth") so that errors in identifying bases can be minimised. Each base of each of these samples has a quality score associate (which is about a 6 bit value). Plus identifiers for all billion odd reads per person.
For fairly simple organisms we have 'cold booted' DNA and swapped a cells DNA with artificial DNA from a closely related organism and it still worked. So, while humans have a lot of Genetic information I suspect you could get it to work if your willing to accept fairly low success changes.
As to read errors that effectively just a 'mutation' which are generally fairly harmless. If you stay below say 1,000 mutations, which would still take vary high accuracy, you have not significantly reduced your chances for success.
We have the raw data on the actual DNA for several people which is what you need to make a copy. What we don't know is what the data means and what all the mutations are in the wild. Which is what you want to know if we are going to start making changes.
Actually we don't have the full DNA sequence for any human. For example, if you look at the data from say the Genome Reference Consortium the first 10,000 bases on Chromosome are designated as N - unknown.
True that we don't have a full sequence, but that's not the best example. The telomeres (ends) consist of the same set of bases repeated thousands of times. Recent research suggests that the length is probably super important. We're good at approximating length, but not detecting exactly.
There are a bunch of regions of 'N' in the reference sequence, most are just repeats.
The genome is incredibly complex, and yes, much we still can't represent accurately. As one example, some genes are given a location in the reference genome, while every person actually has multiple copies that are scattered across the genome.
It's a long time since I worked on mobile phones but in GSM the phone has to keep an internal of the tower it's connected to and the nearest neighbours in order to manage handover smoothly (ie. With dropping calls). I suspect 3G is similar.
In GSM the phone also had to know the distances to each tower +/-500m in order to adjust the timings for communication with the tower.
(It's 8 years since I did is stuff so memory might be off on the numbers a bit :-)
Yep, it's called cell id location. I worked in mobile phone location about 8 years ago and you could find a rough position (about 200m radius IIRC) and we used it as a first pass. That was on GSM networks, 3G cells are smaller.