Iterators and Generators

The ability to procedurally create a collection of new values or access a collection of existing values is an integral feature of any modern programming language. One of the tools that Javascript provides to facilitate this is the iteration protocol. This protocol can be subdivided into two complimentary parts; an iterator protocol, and an iterable protocol. An iterator is something that is able to iterate through a collection. An iterable is a collection that is capable of being iterated (by an iterator).

The Iterator Protocol

Iterators always act on collections in one of two modes; they either create a new collection, or traverse an existing one. The exact behavior is dependent on the implementation. An iterator that creates new values requires additional logic, so that will be reserved for later exploration; this introductory segment will assume that a collection has been externally instantiated.

A collection is simply a sequence of related values. Javascript offers several different data structures (also known as built-in types) for storing collections. A popular choice is the array. While there are many ways to traverse a collection of elements in an array, the humble for loop remains a classic.

const arr = [1, 2];

for (let i=0; i < arr.length; ++i) {
    let currentElement = arr[i];
    console.log(currentElement);
}
// output: 1, 2

Each time the loop executes, a new value from the array arr is assigned to currentElement and logged to the console. This is perfectly functionable and serviceable iteration code. However, the pattern of basic collection iteration is so common that the array data structure itself provides a standardized mechanism to facilitate this behavior. That mechanism is the iterator object, which can be accessed by calling a special method that arr inherits from the Array built-in type. The method is keyed by the Symbol.iterator symbol, so a call to arr[Symbol.iterator] will return the iterator.

const arr = [1, 2];

const iterator = arr[Symbol.iterator]();
console.log(typeof iterator)
// object

Note that Symbol.iterator must be called using square bracket notation. Once a reference to the iterator object has been assigned to its own variable, it can be used to iterate over the elements of the data structure from which it was extracted. This is accomplished by successive invocations of the object’s next method. Each call to this method is analogous to another execution of a for loop.

const arr = [1, 2];
// externally instantiated collection

const iterator = arr[Symbol.iterator]();
// iterator object reference is
// assigned to its own variable

let firstIteration = iterator.next();
console.log(firstIteration);
// {value: 1, done: false}

let secondIteration = iterator.next();
console.log(secondIteration);
// {value: 2, done: false}

let thirdIteration = iterator.next();
console.log(thirdIteration);
// {value: undefined, done: true}

Notice that calling iterator.next() doesn’t return the next element of arr as a raw value. The returned element is instead stored in the appropriately named value property of an object, alongside a boolean property called done. The purpose of including done as a boolean flag is to signify when the iteration of a collection is complete. This allows the above snippet to be refactored as follows:

const arr = [1, 2];
const iterator = arr[Symbol.iterator]();

let isDone = false;
while(!isDone) {
    let nextIteration = iterator.next();
    if (nextIteration.done) {
        isDone = true;
    } else {
        console.log(nextIteration.value);
    }
};
// output: 1, 2

Every execution of the while loop triggers another call of the iterator’s next method, which returns both the next value of array arr and a boolean to determine whether or not the while loop should run again. When the done property comes back as true, the loop exits and the iteration cycle is complete.

The iterator object’s structure and behavior demonstrated in the preceding examples can be distilled down into 2 rules which govern the iterator protocol:

  1. The iterator must be an object that implements a next method
  2. The next method must return an object with two properties: a boolean property called done and a property called value.

These rules can also be expressed by the following interface:

interface Iterator {
  next() {
     // optional additional logic
     return {
        value: <any>,
        done: <boolean>
     };
  };
}

When an iterator with the above specifications is iterating over an existing collection, its default behavior is to return the next element, (formatted as { value: <value>, done: false }) every time iterator.next() is called, until the done property is equal to true.

Iterable Protocol

The complimentary counterpart to the iterator protocol is the iterable protocol. While the iterator protocol describes the object which iterates over a collection, the iterable protocol describes how the collection in question implements that object.

The example code in the previous section revealed that the Array built-in type returns an iterator upon a call to the function defined on its Symbol.iterator property. This is the crux of the iterable protocol, which can be condensed into a single rule:

  1. A collection implementing an iterator object must return that iterator object from a method keyed by the Symbol.iterator symbol.

In this context, a method is simply a function that is a property of a collection. By convention, the method responsible for returning an iterator is called the @@iterator method. Some types of collections have default @@iterator methods defined on their underlying data structures. These default methods are made available to instances of that data structure through inheritance.

For example, in the previous section, the Symbol.iterator property didn’t exist on array arr directly; it was inherited from the Array built-in type, just like any other array method such as forEach or map. The following is a list of Javascript data structures that implement the iterable protocol by default:

  1. Strings
  2. Arrays
  3. Maps
  4. Sets

Initializing a new instance of any of these built-in types will give the collection stored in that instance access to an @@iterator method that returns a default iterator.

Iterator Consumers

While it’s possible to extract a reference to an iterator from its data structure and invoke the next method manually, this syntax is tedious and verbose. The real power of the iteration protocol is the standardized mechanism it provides to other higher-level language features. Most developers will use iterators through abstractions that leverage iterators under the hood. Examples include the for...of loop, the spread operator, and the Array.from() method. These constructs all use iterators implicitly without ever exposing the iterator’s inner workings to the developer.

To appreciate the utility of the iteration protocol, consider a comparison between two similar data structures that are differentiated by the presence of a default @@iterator method. The map data structure is an iterable, while the plain object data structure is not. In the code example below, instances of both are plugged into various iterator consumers. The code acting on the map is operational and will work as expected, but the code acting on the object must be wrapped in try/catch blocks since it will be unable to find a default @@iterator method and thus throw errors.

Custom Iterators

The default iterators provided by Javascript’s built-in data structures all exhibit analogous behaviors; they traverse each element of a collection stored within that data structure. However, iterators can do more than just traverse existing collections; they can create new ones as well. This requires a custom implementation, but the blueprints are already laid out by the iteration protocol itself.

The iterator protocol articulates the two rules of design for an iterator object; it must implement a next method, and that next method must return an object with value and done properties. The following code is minimally compliant, and while it is not practical, it will serve as a first draft for demonstrative purposes:

const iteratorObj = {
    next: function() {
        return {value: undefined, done: true}
    }
}

With the iterator protocol satisfied, the iterable protocol now requires attention. The single requirement is that the iterator object be returned by an @@iterator method keyed by the Symbol.iterator property. Strings, arrays, maps and sets already have an @@iterator method, so overwriting one of those defaults with a custom implementation could cause unexpected behavior and difficult-to-debug errors. However, Javascript objects do not include a default @@iterator method, so a brand new method can be defined on a plain object’s Symbol.iterator property with no adverse effects. Building on the above snippet:

const iterableObj = {};

iterableObj[Symbol.iterator] = function() {
    return iteratorObj;
}

The object iterableObj is now an iterable. However, there is no logic in its custom iterator (the iteratorObject) to create a new collection. Combining the above snippets, and bestowing the custom iterator with the logic to create a collection of integers 1 thru 3, produces the following:

const iterableObj = {};
// create iterable object

const iteratorObj = {
  arr: [1, 2, 3],
  next: function() {
    return {
        value: this.arr.shift(),
        done: this.arr.length == 0
    }
  }
};
// create iterator object

iterableObj[Symbol.iterator] = function() {
    return iteratorObj;
};
// return iterator object from 
// @@iterator method of the iterable

Every time iteratorObj.next() is called, the returned object’s value property will be equal to the next number in arr. When arr is empty, the returned object will be { value: undefined, done: true }.

A custom iterator that generates 3 sequential predefined numbers from a hardcoded array isn’t particularly useful, but the concepts delineated here can be expanded to construct custom iterators with practical applications. The following example showcases an iterable object with an iterator that generates 5 random numbers.

To truly delve into the power of iterators, the next section explores a special kind of iterator, and its associated syntax, that unlocks the next level of possibilities.

Generators

The syntax for implementing a custom iterator is verbose yet predictable, making it the perfect candidate for further abstraction. Enter the generator syntax. Instead of writing out the iterator object imperatively as the return value of an @@iterator function, the function* keyword (note the asterisk!) can be used to declaratively return an iterator object. These special functions are called generator functions, and the iterators they return are called generator objects. Even an empty generator function returns a valid iterator by default.

const iterableObj = {};

iterableObj[Symbol.iterator] = function* () {};

let generatorObj = iterableObj[Symbol.iterator]();

let firstIteration = generatorObj.next();
console.log(firstIteration);
// {value: undefined, done: true}

For comparison, the following code is required to achieve the same functionality without the generator syntax:

const iterableObj = {};

iterableObj[Symbol.iterator] = function() {
  const iteratorObj = {
    next: function() {
      return {value: undefined, done: true}
    }
  }
  return iteratorObj;
}

let iteratorObj = iterableObj[Symbol.iterator]();

let firstIteration = iteratorObj.next();
console.log(firstIteration);
// {value: undefined, done: true}

Calling the next method of a generator object returns the familiar convention of an object with the value and done properties. These values are determined by the logic inside of the generator function. Every time next is called on a generator object, its corresponding generator function executes.

Standard functions in Javascript use the return keyword to return a single value and exit. Generator functions follow the same routine, where the return value both specifies the value property to be returned by next, and signifies that the the generator is done executing (thus providing a value for the done property as well).

const iterableObj = {};

iterableObj[Symbol.iterator] = function* () {
  return "generator is done"
};

let generatorObj = iterableObj[Symbol.iterator]();

let firstIteration = generatorObj.next();
console.log(firstIteration);
// {value: "generator is done", done: true}

Any further calls to generatorObj.next() in the above example will return {value: undefined, done: true}; the generator has returned a single value and exited. When a generator needs to return multiple values, the yield keyword is used instead; yielding a value will deliver it from the generator function to the generator object, without exiting the generator function. This is one of the key features of generators. The yield keyword redirects the control flow of the program to outside the generator function while still maintaining all of that generator function’s internal state. This effectively translates into the ability to “pause” a function until its corresponding generator object invokes the next method again.

For example, a generator that yields an incrementing value can preserve its internal state (the i variable in this example) while the program is busy writing text to the console.

const iterableObj = {};

iterableObj[Symbol.iterator] = function* () {
  for (let i=0; i <= 5; ++i) {
    yield i;
  }
  return 0;
};

let generatorObj = iterableObj[Symbol.iterator]();

let firstIteration = generatorObj.next();
console.log(firstIteration);
// {value: 0, done: false}

console.log("While this text is being logged, the generator function's internal state is preserved. The for loop will pick up where it left off when generatorObj.next() is called again")

let secondIteration = generatorObj.next();
console.log(secondIteration);
// {value: 1, done: false}

Further calls to generatorObj.next() will produce successively incrementing values until the internal for loop exits, at which point the return statement signals the generator function to exit. Because generator functions execute “on demand,” they can theoretically use constructs like infinite loops; a yield statement inside of a while(true) loop will inexhaustibly return a new value every time next is called, but this will cause a memory error if used by an unrestrained iterable consumer like a spread operator.

Standalone Generator Functions

When an iterator is traversing an existing collection, it makes sense for the iterator to be stored on the existing collection’s underlying data structure. This ensures that any time something like an array is provided to an iterator consumer, the consumer can predictably locate an @@iterator method and access the array’s iterator object. However, when an iterator is generating a new collection, there’s no inherit benefit in attaching that iterator to a data structure that is empty. Generators address this issue by also being available as independent first-class functions.

The last code snippet demonstrated a generator function acting as a @@iterator method for a Javascript object. There’s nothing about the Javascript object that’s particularly conducive to the generator’s purpose of generating integers 0 through 5, so that generator can be extracted from its host object and implemented as a standalone function called generateNumbers.

function* generateNumbers() {
  for (let i=0; i <= 5; ++i) {
    yield i;
  }
  return 0;
};

let generatorObj = generateNumbers();

let firstIteration = generatorObj.next();
console.log(firstIteration);
// {value: 0, done: false}

let secondIteration = generatorObj.next();
console.log(secondIteration);
// {value: 1, done: false}

Like regular functions, generator functions can also take arguments. Adding two parameter to the generateNumbers function makes it much more flexible by enabling the generation of any set of numbers within a given range.

The ability to supply parameters to generator functions is a powerful feature, but doesn’t cover the use case of supplying new data to the generator after it begins running. Once a generator object has been returned from a generator function, the function parameters can’t be updated for subsequent iterations. This limitation is overcome by the yield keyword’s ability to act as a two-way conduit for data transmission between the generator object and the generator function; just as the generator function can supply data to the generator object, the generator object can supply data back into the generator function. This is done by passing a value to the generator object’s next method as an argument. This argument will be returned by the yield keyword inside of the generator function.

function* generateNumbers(min, max) {
  for (let i=min; i <= max; ++i) {
    let message = yield i;
    console.log(`logged from inside: ${message}`);
  }
  return 0;
};

let generatorObject = generateNumbers(5, 7);

console.log(generatorObject.next("1st message"));
console.log(generatorObject.next("2nd message"));
console.log(generatorObject.next("3rd message"));
console.log(generatorObject.next("4th message"));

// console output (in order):
// {value: 5, done: false}
// "logged from inside: 2nd message"
// {value: 6, done: false}
// "logged from inside: 3rd message"
// {value: 7, done: false}
// "logged from inside: 4th message"
// {value: 0, done: true}

Notice that the string “1st message” does not get logged to the console. Generators in Javascript exhibit the idiosyncratic behavior of ignoring the argument of the first (and only the first) next call. However, every other invocation of next successfully assigns its string argument to the variable message inside the generator function. This allows generators to act on data that was originated from outside the function between generator iterations.

Nested Generators

A regular function is capable of calling another function within its body.

function isEven(num) {
  return num % 2 == 0;
}

function evenOrOdd(input) {
 if (isEven(input)) {
   console.log("number is even");
 } else {
   console.log("number is odd");
 }
}

This behavior is possible because regular functions always return a single value. Generator functions technically also return a single value, which is the generator object. But the generator object isn’t intended to be the final result; rather, it is intended to be the control mechanism by which the generator function’s actual logic is invoked. Therefore, attempting to yield a call to a nested generator will yield the inner generator’s iterator object to the iterator object of the outer generator function. This is problematic because generator objects are not iterator consumers. That is, the next method cannot just return yet another iterator. A middleman is required to take the incoming iterator, consume it, and give the resulting sequence of values to the receiving iterator. That is the job of the yield* keyword.

When an asterisk is appended to the yield keyword, that makes it an iterator consumer.

function* outerGenerator() {
  yield* innerGenerator();
}

Assuming innerGenerator is a proper generator, it will return an iterator when called. If outerGenerator immediately yields this iterator to its generator object (which is itself an iterator), that generator object will be unable to consume it. To resolve this issue, the yield* keyword safely consumes the iterator of innerGenerator, and sends whatever sequence of values are emanated from that iterator to the generator object of outerGenerator.

Generator objects have a special dual nature; they implement both the iterator protocol, as they include a next method that supports standard iterator behavior, and the iterable protocol, as they are compatible with iterator consumers (such as yield*). However, they are not in themselves iterator consumers, which is why the yield* keyword is required.

In Summary…

Iterators

  • Javascript provides an iteration protocol that standardizes the traversal of existing collections and the creation of new collections.
  • The iteration protocol consists of an iterator protocol and an iterable protocol.
  • The iterator protocol requires that 1) the iterator must be an object that implements a next method, and 2) the next method must return an object with two properties: a boolean property called done and a property called value.
  • The iterable protocol requires that a collection implementing an iterator object must return that iterator object from a method keyed by the Symbol.iterator symbol.
  • Collections that are iterable by default include strings, arrays, maps and sets.
  • An iterator can be manipulated by calling its next method manually, or it can be be supplied to an iterator consumer this will use the iterator “under the hood” to perform some action.

Generators

  • Custom iterator implementations can be written with the generator syntax.
  • Generators are functions which return a special kind of iterator called a generator object.
  • Every call to a generator object’s next method will execute the body of the corresponding generator function.
  • Upon execution, generator functions can either return a value, which indicates the generator is done running, or yield a value, which pauses the internal state of a generator until it is run again.
  • A generator function can call another generator function and yield the values derived from the resulting generator object using the yield* keyword, which is itself an iterator consumer.
  • Generator objects implement both the iterator protocol and the iterable protocol, as they are both valid iterators and in turn can be iterated by an iterator consumer.
Published
Categorized as Post