For linear models, least squares leads to the BLUE estimator: Best Linear Unbiassed Estimator. This acronym is doing a lot of work with each of the words having a specific technical meaning.
Fitting the model is also "nice" mathematically. It's a convex optimization problem, and in fact fairly straightforward linear algebra. The estimated coefficients are linear in y, and this also makes it easy to give standard errors and such for the coefficients!
Also, this is what you would do if you were doing Maximum Likelihood assuming Gaussian distributed noise in y, which is a sensible assumption (but not a strict assumption in order to use least squares).
Also, in a geometric sense, it means you are finding the model that puts its predictions closest to y in terms of Euclidean distance. So if you draw a diagram of what is going on, least squares seems like a reasonable choice. The geometry also helps you understand things like "degrees of freedom".
The least squares model will produce unbiassed predictions of y given x, i.e. predictions for which the average error is zero. This is the usual technical definition of unbiassed in statistics, but may not correspond to common usage.
Whether x is a noisy measurement or not is sort of irrelevant to this -- you make the prediction with the information you have.
R is so good in part because of the efforts of people like Di Cook, Hadley Wickham, and Yihui Xie to create an software environment that they like working in.
It also helps that in R any function can completely change how its arguments are evaluated, allowing the tidyverse packages to do things like evaluate arguments in the context of a data frame or add a pipe operator as a new language feature. This is a very dangerous feature to put in the hands of statisticians, but it allows more syntactic innovation than is possible in Python.
Like Python, R is a 2 (+...) language system. C/Fortran backends are needed for performance as problems scale up.
Julia and Nim [1] are dynamic and static approaches (respectively) to 1 language systems. They both have both user-defined operators and macros. Personally, I find the surface syntax of Julia rather distasteful and I also don't live in PLang REPLs / emacs all day long. Of course, neither Julia nor Nim are impractical enough to make calling C/Fortran all that hard, but the communities do tend to implement in the new language without much prompting.
Producing a diverse list of results may still help in a couple of ways here.
* If there are a lot of lexical matches, real semantic matches may still be in the list but far down the list. A diverse set of, say, 20 results may have a better chance of including a semantic match than the top 20 results by some score.
* There might be a lot of semantic matches, but a vast majority of the semantic matches follow a particular viewpoint. A diverse set of results has a better chance of including the viewpoint that solves the problem.
Yes, semantic matching is important, but this is solving an orthogonal and complementary problem. Both are important.
Here's a scenario. You're running a cluster, and your users are biologists producing large datasets. They need to run some very specific command line software to assemble genomes. They need to edit SLURM scripts over SSH. This is all far outside their comfort zone. You need to point them at a text editor, which one do you choose?
I've met biologists who enjoy the challenge of vim, but they are rare. nano does the job, but it's fugly. micro is a bit better, and my current recommendation. They are not perfect experiences out of the box. If Microsoft can make that out of the box experience better, something they are very good at, then more power to them. If you don't like Microsoft, make something similar.
> You're running a cluster, and your users are biologists producing large datasets. They need to run some very specific command line software to assemble genomes. They need to edit SLURM scripts over SSH. This is all far outside their comfort zone. You need to point them at a text editor, which one do you choose?
Wrongly phrased scenario. If you are running this cluster for the biologists, you should build a front end for them to "edit SLURM scripts", or you may find yourself looking for a new job.
> A Bioinformatics Engineer develops software, algorithms, and databases to analyze biological data.
You're an engineer, so why don't you engineer a solution?
What a strange web page. Scrolling is thoroughly broken.
I recently went to a two day workshop on whole cell modelling. I'm still trying to work out how much of the exercise is fantasy. I get that some of the chemistry is well enough understood to simulate from the ground up, but there's so much more to it.
The oddest thing to me is the level of satisfaction in being able to run the model. I would think the model has to be very very fast, because of all the work that needs to be done with it to fit it to data and fully understand its behavior.
Lol! Overriding basic scroll functionality on esoteric cell simulation software documentation pages is such a pointless gamble. Really calls the quality of the software itself into question.
Thanks for this. Despite the vintage this seems very clearly written, the introductory material on measure theory has already made it worthwhile for me.
As others have said in various ways, start by fitting a survival model using glmnet.
That said, here are some folks trying to use SDEs to model cells, they even have a "dW" on their logo. This is a long way from predicting age of death, but it might eventually give insights into the exact mechanism. Also I think they're starting with bacteria and yeast, so mice might be a way off.
A further step is Langevin Dynamics, where the system has damped momentum, and the noise is inserted into the momentum. This can be used in molecular dynamics simulations, and it can also be used for Bayesian MCMC sampling.
Oddly, most mentions of Langevin Dynamics in relation to AI that I've seen omit the use of momentum, even though gradient descent with momentum is widely used in AI. To confuse matters further, "stochastic" is used to refer to approximating the gradient using a sub-sample of the data at each step. You can apply both forms of stochasticity at once if you want to!
The momentum analogue for Langevin is known as underdamped Langevin, which if you optimize the discretization scheme hard enough, converges faster than ordinary Langevin. As for your question, your guess is as good as mine, but I would guess that the nonconvexity of AI applications causes problems. Sampling is a hard enough problem already in the log-concave setting…
Fitting the model is also "nice" mathematically. It's a convex optimization problem, and in fact fairly straightforward linear algebra. The estimated coefficients are linear in y, and this also makes it easy to give standard errors and such for the coefficients!
Also, this is what you would do if you were doing Maximum Likelihood assuming Gaussian distributed noise in y, which is a sensible assumption (but not a strict assumption in order to use least squares).
Also, in a geometric sense, it means you are finding the model that puts its predictions closest to y in terms of Euclidean distance. So if you draw a diagram of what is going on, least squares seems like a reasonable choice. The geometry also helps you understand things like "degrees of freedom".
So, may overlapping reasons.