> If temperature is high, then given the same input, it will output the same words. If the temperature is low, it will more likely output different words.
The other way around. Think of low temperatures as freezing the output while high temperatures induce movement.
But! if temp denotes a graphing in a non-linear function (heat map) then it also implies topological, because temperature is affected by adjacency - where a topological/toroidal graph is more indicative of the selection set?
The term temperature is used because they are literally using Boltzmann's distribution from statistical mechanics: e^(-H/T) where H is energy (Hamiltonian), T is temperature.
The probability they give to something of score H is just like in statistical mechanics, e^(-H/T) and they divide by the partition function (sum) similarly to normalize. (You might recognize it with beta=1/T there)
Right, it’s not termed to be understood from the user perspective, a common trait in naming. It increases the jumble within the pool of next possible word choices, as heat increases Brownian motion. That’s the way I think of it, at least.
The other way around. Think of low temperatures as freezing the output while high temperatures induce movement.