Got things running at about ~1ns per iteration, seems good enough.
Adopted a linear strategy to evaluating the bytecode, rather than a
recursive or even imperative evaluation strategy; this also lets me
elide the offsets and store the bytecode in half the size. Looking
forward to finding out that formulas are evaluated wrongly, but couldn't
find a counterexample. Also restructured things a bit to avoid multiple
alocations when evaluating by this strategy.