Does this generalize to higher dimensions? I’m realizing my mathematics education never really addressed alternative coordinate systems outside of 2/3 dimensions
The last method mentioned at that wolfram.com link should work for any dimension (i.e. choosing random Cartesian coordinates with a normal distribution, then normalizing the vector obtained thus to get a point on the sphere).
The method presented in the parent article is derived exactly from this method of Marsaglia, and it should also work for any dimension.
There should conceptually be something similar for higher dimensions, but I'm not sure that it only involve reasonable functions, where by reasonable functions I mean functions you are likely to find in the math library of your programming language.
Here's an outline of where the θ = 2πu, φ = acos(2v-1) where u, v are uniform random values from 0 to 1 comes from.
If you just picked a uniform random latitude and longitude for each point the distribution would not be uniform, because lines of latitude vary in length. They are smaller the farther they are away from the equator. You need to pick the longer lines of latitude more often than you pick the shorter lines of latitude. The probability density function (PDF) for picking latitude needs to be proportional to latitude length.
If you have a non-uniform distribution and you need to pick an x from it but only have a uniform random number generator there is a trick. Figure out the cumulative density function (CDF) for your PDF. Then to pick a random value use your uniform random number generator to pick a value y from 0 to 1, and find the x such that CDF(x) = y. That's your x. I.e., pick x = CDF^(-1)(y).
Latitude length is proportional to sin(φ) or cos(φ) of latitude depending on whether you are using 0 for the equator or 0 for one of the poles (I think the Mathworld article is using 0 for one of the poles), which makes the PDF proportional to sin(φ) or cos(φ). The CDF is the integral of PDF so ends up proportional to cos(φ) or sin(φ). Using the inverse of that on your uniform random numbers then gives the right distribution for you latitude lengths. Thus we have the acos() in the formula for φ. The argument is 2v-1 rather than v to cover both hemispheres.
You could do the same things in higher dimensions. For a 4D sphere, you could slice it into parallel "lines" of "hyper-latitude". Those would be 2D instead of the 1D of the lines of latitude on a 3D sphere. Then to pick a random point you would chose a line of hyper-latitude at random, and then randomly place a point on that line of hyper-latitude.
Like with the 3D sphere you would need to weigh the lines of hyper-latitude differently depending on where they are on the 4D sphere's surface. Do something similar with the PDF and CDF to use the CDF^(-1) to pick one the lines of hyper-latitude with your uniform random number generator.
You then need to place a point uniformly on that slice of hyper-latitude. That's a 2D thing in this case rather than the 1D we had to deal with before so we will need two more random numbers to place our point. I have no idea what the hell it looks like, but I'd guess it is not going to be "square", so we can't simply use two uniform random numbers directly. I suspect 2D thing would also have to sliced up in parallel lines of "sub-latitude" and we'd have to do the whole PDF and CDF and inverse CDF things there too.
I think in general for an N dimensional sphere you would end up with placing your random point involves picking 1 coordinate directly with your uniform random number generator, and the other N-2 would involve inverse CDFs to get the right distributions from the uniform random number generator.
I have no idea whatsoever if those CDFs would all be simple trig functions like we have in the 3D case, or would be more complicated and possibly not have a reasonably efficient way to compute their inverses.