Environments: under the hood of variables – Deep JavaScript chapter

wwright · on July 26, 2020

Do modern JS engines still map an entire lexical scope to an environment at all times? It seems like it would be fairly simple to optimize it down and only box up and share the variables that the code actually closes over.

hajile · on July 26, 2020

Modern JITs make a C++ class to represent objects. Closures are just objects.

We'll ignore non-strict code which allows external functions to access variables. If you want to actually eliminate these variables, you'd have to make TWO classes per function. Next you'd have to COPY all the relevant stuff from one class to the other. Maybe you instead keep the same class and add some code to make the various pointers null. That saves all the copying at the expense of some literals taking up the same space, but doesn't solve this next issue.

JS has weak reference maps/sets along with upcoming weak values. If I reference an object in a closure (one that is used in a weak map), but don't return that variable, what happens? If it's set to null, then something is going to potentially be GC'd when it shouldn't be.

Perhaps this kind of lifetime issue could be detected with a sufficiently smart JIT, but in such a dynamic language without types, I doubt it's possible.

ridiculous_fish · on July 26, 2020

There's a lot of misunderstanding here. A closure does not store the variables it creates. Instead, the closure's code block will allocate space for those variables when it is called.

The GC must trace all reachable values. This includes the stack, and also variables captured by closures which are themselves reachable.

The same principles apply whether or not a JIT is in use.

hajile · on July 27, 2020

I didn't misunderstand what I was writing (though I may not have succeeded in conveying what I meant). At compile time, a hidden class will be created to represent all the variables (both user-visible and user-inaccessible). Every function is actually called by an internal [[Call]]. This will create an instance of that class which is placed on the heap and may or may not copy some values into the stack (it's essentially a necessary performance optimization, but not actually hard requirement in all ISAs).

A closure is an object that has only one public method or property (a function application sometimes called apply). In theory, if you could guarantee that the apply method and all nested lexical scopes never accessed a variable and the variable was not reference counted, you could safely null the entry in the class and allow whatever it pointed to to be GC'd. My only contention is that this is easier to theorize about than to actually do.

ridiculous_fish · on July 26, 2020

Captured variables live in an environment, while non-captured variables live on the stack.

xg15 · on July 26, 2020

This is also cause for the following entertaining bug:

  for(let i=0; i<buttons.length; i++) {
    buttons[i].addEventListener('click', e => alert('button ' + i + ' clicked!');
  }

The above code is supposed to show a unique number for each button when the button is clicked. What it will actually do is to show the same number for all buttons, the number being equal to buttons.length.

The reason is that an event handler does not inherit the value of i when it is created, but a pointer to the environment in which i is defined. Because all event handlers point to the same parent environment, they will also pick up changes to i's value that happen after they were created.

If you add seemingly redundant function call to separate environments, everything will work. So the following code will work correctly:

  function makeHandler(index) {
      return e => alert('button ' + index + ' clicked!');
  }

  for(let i=0; i<buttons.length; i++) {
    buttons[i].addEventListener('click', makeHandler(i));
  }

goranmoomin · on July 26, 2020

That's not the case with a let-binding, which creates it's own scope for blocks. I think you're mixing it up with var, which yes it does have the problem that you're mentioning. (And hence the universal advice of avoid using var.)

daniel-s · on July 26, 2020

Wait, so his bug has a bug in it and if tested would bug-out by working properly?

xg15 · on July 26, 2020

Yes, due to using let instead of var, combined with special behavior of for(;;).

aikah · on July 26, 2020

> The above code is supposed to show a unique number for each button when the button is clicked. What it will actually do is to show the same number for all buttons, the number being equal to buttons.length.

no "let" will work as expected (lexical scoping). It's var which worked at the function scope, that's why "var" usage is not encouraged.

It's very confusing to JS newcomers, that's why it is important to learn the history of the language.

ylyn · on July 26, 2020

Actually it's not just lexical scoping.

If you had pure lexical scoping, given the way the for statement works, you would expect it to show the same value because it is the same x being captured.

JS actually specifically handles a for (let ...) specially by creating a _copy_ of the variable. See the spec[1] (the third clause starting with for (LexicalDeclaration ...)).

[1]: https://www.ecma-international.org/ecma-262/11.0/index.html#...

xg15 · on July 26, 2020

Thanks for the replies. I really wasn't aware of the behavior of let here and did indeed learn something. The "buggy" code really does work.

It's important to note that this is not just due to scoping but that a regular for loop apparently really makes copies of its environment for each iteration, as ylyn described[1].

You can demonstrate this as follows:

  const objs = []; 
  for (let i = 0; i < 3; i++) { // i++ seems to be executed after copying but before execution of the loop body.
    objs.push({ 
      getI() {return i},
      add10() {i += 10} 
    });
  }

  // each object gets a copy of i from its respective iteration:
  console.info(objs[0].getI(), objs[1].getI(), objs[2].getI()); // prints 0 1 2

  // the copies can be modified independently:
  objs[0].add10();
  objs[2].add10();
  console.info(objs[0].getI(), objs[1].getI(), objs[2].getI()); // prints 10 1 12

I'm not sure I like this design, to be honest. This is an awfully complex special case hidden in something as mundane-looking as a regular for loop.

Seems to me, this can make understanding the general principles behind scopes and environments harder at the expense of making a particular special case more convenient.

[1] https://news.ycombinator.com/item?id=23955809

hencq · on July 26, 2020

The book Crafting Interpreters has a design note specifically about this special case [0] and how different languages handle it:

> The increment clause really does look like mutation. That implies there is a single variable that’s getting updated each step. But it’s almost never useful for each iteration to share a loop variable. The only time you can even detect this is when closures capture it. And it’s rarely helpful to have a closure that references a variable whose value is whatever value caused you to exit the loop.

> The pragmatically useful answer is probably to do what JavaScript does with let in for loops. Make it look like mutation but actually create a new variable each time because that’s what users want. It is kind of weird when you think about it, though.

[0] https://craftinginterpreters.com/closures.html#design-note

johnfn · on July 26, 2020

This seems pretty standard to me. Could you produce an example of a language which doesn't have this behavior?

EDIT:

I think I understand your confusion. You've gotten confused because you're expecting `y = 5; let x = y` be able to hold a reference to `y` the number in the same way that `y = {}; let x = y` holds a reference to y the object. But that's honestly the behavior of practically every language I could name - numbers and other small values that can fit in a register are always copied.

You can see the referential behavior if you change your `i` to an object:

    let result = []

    for (let obj = {}; Object.keys(obj).length < 3; obj[Math.random()] = "hello") {
       result.push(obj);
    }

    console.log(result); // notice all 3 entries in the array are the same

But yeah - JavaScript does nothing special in either of these cases. Similar code will get you similar results across any language I could name. The only way to get around it would be to manually mark your numeric i as a reference in a language that has them, like C/C++/Rust.

tomxor · on July 26, 2020

The confusion is not about assignment behavior, it's about block scopes, specifically within the for loop.

Before JS had `let` and `const` there was no built in way of obtaining a block scope (for, if, else, while, do and {}), you had to obtain one artificially by passing variables into a function scope, e.g a predefined function or an IIFE.

I'm talking about `var`... paste this into your browsers console, you get 8 logged 8 times:

  for (var i = 0; i < 8; i ++) {
   setTimeout(() => {
    console.log(i)
   }, 100);
  }

I'm pretty sure this is the issue the OP was trying to demonstrate.

With `var` i is _not_ bound to the for loop scope, it's bound the closest parent function scope. The key difference is not only that i can be accessed after the for loop, but that there is only one parent function scope, so i is effectively overwritten - whereas if i was bound to the for block there would be a separate scope for each iteration.

Since the timeout function has no i in it's scope it walks up the parent scopes, it can only find i in the parent function scope and by the time it has executed it will be whatever the last iteration assigned it - or even more confusingly what something after the loop assigned it.

This is why `let` was added:

  for (let i = 0; i < 8; i ++) {
   setTimeout(() => {
    console.log(i)
   }, 100);
  }

Which gives you 0 through 7

let assigns i to the for loop's scope, with a different one for each iteration, this preserves the local context the setTimeout function was defined in which is intuitive and what people expect.

Another nice thing that people forget with this feature (including myself out of habbit) is that you can now replace IIFEs with block scope literals {} (provided you stop using var, which you should have anyway)

In short, var was a nightmare, it's only benefit is obfuscation, it was the source of countless needless bugs, and that's why we now have const/let and block scopes.

xg15 · on July 26, 2020

Yes, exactly. I think what was confusing to me here is the interaction between child environments in for(;;).

Usually, scopes in JS work like the OP link describes them: If a block is entered, a child environment is created that has a reference to its parent environment. A variable either belongs to the parent and is shared with all children - or belongs to one particular child and is initialised through assignment. However different child environments can never influence each other.

for(;;) does something different: It takes one child environment (of iteration n) and clones it for iteration n+1. The effect is that there is a single let i = ... statement which leads to the creation of several independent variables.

It's easy to see why this was done - it's the only way how you can have both mutating statements like i++ and closures that capture state from a particular iteration - but I'm not aware this is done anywhere else in JS.

tomxor · on July 26, 2020

You are right, although this doesn't feel all that exotic to me, even though it might be unique for built-ins, it's essentially the same as what happens when you pass variables to functions (you get a copy of the references, i.e they are added to the scope), in this sense a block scoped for loop feels similar to a series of function calls passing and returning a variable.

In fact you could emulate a block scoped for loop with an object with little trouble and without being all that confusing:

  const FOR = (s, c, f) => c(s) &&
    (f({...s}), FOR(s, c, f))

  FOR({i: 0}, s => s.i++ < 8, s => {
    setTimeout(() => {
      console.log(s.i)
    }, 100)
  })

This doesn't prove anything, but the fact that blocked for can almost be implemented with in one small function makes me feel like it's not all that magic, rather it's just nice syntax.

[edit]

getting a bit silly now but was seeing how close I could get to built-in syntax... yes this is horrible, never use `with` or `eval` like this.

  const EVIL = (f, s) => eval(`
    with (s) {
      ${(f + '').replace(/^.+>/, '')}
    }
  `)

  const FOR = (s, ce, fe, sb, ii) => {
    if (EVIL(ce, s)) {
      EVIL(sb, {...s})
      EVIL(fe, s)
      FOR(s, ce, fe, sb, 1)
    }
  }

  FOR ({i: 0}, _=> i < 8, _=> i++, _ => {
    setTimeout(() => {
      console.log(i)
    }, 100)
  })

xg15 · on July 26, 2020

You have to clone from the previous iteration - because in theory, the block itself could modify the loop variable again and that modification gets lost if you keep cloning from the initial environment.

So your function would produce incorrect results for cases like this:

  for (let i = 0; i < 100; i++) {
    i += 10;
    setTimeout(() => console.log(i), 1000);
  }

I think the following should work though:

  function FOR(env, cond, incr, block) {
    if (cond(env)) {
      block(env);
      const env2 = {...env};
      incr(env2);
      FOR(env2, cond, incr, block);
    }
  }

  FOR({i: 0}, s=>(s.i < 8), s=>s.i++, s=>{
    setTimeout(() => {
      console.log(s.i)
    }, 100)
  });

And yeah, I agree, it's not really complicated to implement (as long as you don't care about the waste of memory). It was just an unexpected bit of logic at that point.

mbrock · on July 26, 2020

But closures still hold references to variables, even if they are numbers.

    let counter = () => {
      let i = 0
      return () => { ++i; return i }
    }

    let x = counter()
    console.log(x(), x(), x()) // 1 2 3

If you reimplement the OP's for (let i...) loop with a loop where the i variable is declared before the loop, you will get the behavior OP mentioned.

jedimastert · on July 26, 2020

I've been working on a minimax tic tac toe solver[0] to stretch my legs in JS and I ran into this exact same bug. I ended up making objects for the positions instead[1]

[0]:https://github.com/amtunlimited/ttt-js

[1]:https://github.com/amtunlimited/ttt-js/blob/0994d6ff03b338a6...

dgb23 · on July 26, 2020

Rust has a related concept of capturing references by default:

https://doc.rust-lang.org/rust-by-example/fn/closures/captur...

dgb23 · on July 26, 2020

Is the comparison too far off to be useful or straight up incorrect?

millstone · on July 26, 2020

It reads like an attempt to hijack the thread to make it about Rust.

dgb23 · on July 27, 2020

Thank you for clarifying. I can see how it may come off as this.

I guess this is a symptom of learning language X in our free time in combination with a "I know this" dopamine hit while being in the early learning stages.

By the way I always thought of JS closing over references (instead of values) was a feature. A closure over a mutable reference or pointer essentially becomes an object interface. Otherwise defaulting to 'const' makes the intent clear.