The GPU takes model space objects (a long list of vertexes representing the 3 co...

The GPU takes model space objects (a long list of vertexes representing the 3 corners of triangles), a camera position and a vertex shader that transforms them into screen space. Then it splits the triangles into fragments (pixels), and runs the provided fragment shader (also called pixel shader in dx-land) on each fragment.

The inputs provided are defined by the programmer, and in this case, they are an uniform float (same value given to every fragment) called time, and a varying vec2 (different value given to each fragment, the vertex shader outputs a value for this at every vertex, and when the fragments are created, a value is linearly interpolated between the three relevant vertexes based on the position of the fragment on the triangle) called vUv, which is just the co-ordinates of a pixel on the sphere.

gl_FragColor is the single output, a rgba vector that represents color that is applied for that fragment.

The only things missing from the more typical shaders used in all games and, for example, google earth is texturing and lighting -- normally you provide the fragment shader a texture handle, which it then uses with texture co-ordinates provided from the vertex shader to sample the texture, and for lighting the vertex shader provides a surface normal, which you can use with an uniform argument for the location of the light source to correctly shade the fragment.

If the idea of running a program for each and every separate pixel individually sounds inefficient, it would be, but the programming model here is actually SIMT. That is, you give the program as if you are working on a single pixel, but it is converted to actually work on a wide SIMD array, doing 32 (NVidia) or 64 (AMD) pixels at the same time. The cost of this is that all conditional branches are essentially converted into conditional operations, meaning that as a first approximation, you always execute both sides of any if statement.