Daniel Waterworth

A Complicated Function

So, I'm working on a library for dataflow programming (think spreadsheets). It's called derivation. It's not ready to used in production yet, but it's coming together nicely. Actually, it's not just one library. Aside from the core library, there's a library for syncing state with broswers via websockets and for syncing state with postgres.

In the core library, reactive values (values that change over time) are represented using ReactiveValue<X>. These are like cells in a spreadsheet. You can create base values and derived reactive values. New values are computed right away. Then, later, you can update the base values and call graph.step() and all the affected derived values will be recomputed.

You've likely seen systems like this before. There are a half dozen libraries like this already in typescript/javascript. Perhaps you've seen MobX, Jotai or Solid.js. There are subtle distinctions between each of these, but what makes derivation different is that it's push-based, has global timesteps and derived values have explicit dependencies (rather than being inferred implicitly).

But, what's really interesting about derivation is what I'm building on top of it.

With @derivation/composable, you can have nested, reactive collections and perform operations on them which update efficiently. i.e. if you filter a reactive list, it won't re-evaluate the predicate for every element every time the list changes. Instead, it'll only re-evaluate the predicate for items that have changed (and, in fact, the predicate might not need to be entirely re-evaluated).

This works, because, under the hood, a Reactive<X> has the stream of Xs like you'd expect, but, importantly, it also contains the stream of Changes<X>, which are like diffs.

The meat of this library are these primitive operations which statefully compute the changes to the output of an operation based on the changes to the input. And, of course, these primitive functions are all very general. For instance:

function mapList<X, Y>(
  graph: Graph,
  list: Reactive<List<X>>,
  f: (x: Reactive<X>) => Reactive<Y>,
): Reactive<List<Y>> 

Think for a second about what this function has to do. We need to build Reactive<X>s from the input, but we don't want to trigger an update for every Reactive<X> for every change to the input. We need to maintain a collection of Reactive<Y>s, and then, somehow, we need to piece together the Reactive<Y>s into a Reactive<List<Y>>, without having to build the list from scratch every time an element changes.

This is fiendishly complicated and it's out of reach of what modern, frontier LLMs can achieve (believe me, I've tried). The state space is just very large, so you have to decompose the problem to make it manageable, and, it's not even obvious what kind of decomposition would even help.

So, let me show you my implementation of this function:

function mapList<X, Y>(
  graph: Graph,
  list: Reactive<List<X>>,
  f: (x: Reactive<X>) => Reactive<Y>,
): Reactive<List<Y>> {
  const [structure, map] = decomposeList(graph, list);
  const mappedMap = mapMap(graph, map, f);
  return composeList(graph, structure, mappedMap);
}

Yeah, three lines. What we're doing here is separating handling the ordered nature of the input and output from dealing with mapping over the elements.

decomposeList gives you a Reactive<List<ID>> and a Reactive<Map<ID, X>>. The important property, which makes this a step forward, is that the IDs are primitive values which never change. Given those constraints, we can put the list back together again with composeList.

mapMap is another very difficult function to write, but at least it has stable indices. And, you can imagine that composeList and decomposeList might be useful for implementing other list operations in terms of map operations.

Present day LLMs don't program like this (they'd sooner spew hundreds of lines of buggy, difficult to follow code), but I don't think it's because they are architecturally incapable of doing it. I think they just lack the training data.