Glimmer's Optimizing Compiler
We’ve made several improvements since I last wrote about the Glimmer VM, so I wanted to take a moment to give a quick overview of our recent work. Over the summer, Yehuda, Tom, and I focused on improving initial render performance by optimizing how we compile templates.
From the beginning, the Glimmer VM has compiled templates to a JSON data structure. While most other libraries compile components to JavaScript (including React, Vue, and Angular), JSON has the advantage of parsing much faster and requiring less memory in the user’s browser.
While this approach is relatively fast and has greatly reduced the payload size of Ember applications in practice, we considered it to be a stepping stone to the ultimate goal: compiling templates into a binary program that is both space-efficient and skips the parsing step altogether.
Glimmer VM Overview
Like many VMs, the Glimmer VM executes a sequence of numbers known as bytecode. This bytecode is just an encoded set of instructions for rendering the UI. For example, a template of <h1>Hello World</h1> would be compiled to the following wire format at build time.
[
["open-element", "h1", []],
["text", "Hello World"],
["close-element"]
]
At runtime the above wire format would be compiled into roughly the following program and constant pool:
const Program = [25, 1, 0, 0, 22, 2, 0, 0, 32, 0, 0, 0]
const ConstantsPool = {
strings: [... ‘h1’, ‘Hello World’]
...
}
Originally a bytecode was just 4 32-bit integer sequence, where the first 32-bits describe the type of operation. This is typically known as an opcode. The next 96 bits were reserved for arguments for the opcode known as operands. In the example above “25” is an opcode for creating an element and takes “1” as an operand which is just a pointer to look up “div” in the constants pool. “22” is an opcode to append text and “2” is just the pointer into the constants pool. Once we have compiled into these data structures the VM then starts executing the program by iterating and mapping each opcode into an opcode handler.
Each opcode handler is just a small function that does one thing and does it well. For instance, the “open-element” opcode does the follow:
APPEND_OPCODES.add(Op.OpenElement, (vm, { op1: tag }) => {
vm.elements().openElement(vm.constants.getString(tag));
});
As one can imagine each one of these opcodes has relatively low-overhead as they do not perform very much work. This leads us to our hypotheses: by pre-computing the binary data and the constants pool at build time we will greatly improve runtime performance.
The Bundle Compiler: An Optimizing Compiler For Glimmer Templates
With the binary VM in place, we worked on lifting out the JIT compiler so we could run it at build time. By doing so we would be able to produce the binary program and pre-computed constants pool at build time. This meant that the wire format that was sent to the client in previous versions of the VM would become just an intermediate representation (IR) to the bundle compiler at build time. While moving the compiler to build time may seem fairly straightforward, a lot of the work had to do with designing a system that allowed for linking up JavaScript objects at runtime. Since the JIT compiler ran in the browser it had the luxury of referencing live javascript objects to call into, however, the point of the bundle compiler is to fully compute the program needed to render by just parsing the templates and not call into user-defined objects.
To solve this problem we use the concepts of “handles”. A handle is just a unique integer identifier assigned to external objects referred to in templates such as other components and helpers. At compile time we simply “stub” the place in the program with a handle at the point a JavaScript object would be accessed. At runtime, the host environment is responsible for exchanging a handle for the appropriate live JavaScript object. Having this indirection allows for us to serialize to binary data along with the constants pool. While pre-computing the objects at build time is already a large performance win, we can do further optimizations because we have full program knowledge at build time.
Packaging Optimizations
The first optimization that we did was move to a more compact bytecode layout. As we mentioned above a bytecode was a fixed size; 4 32-bit integers, however the vast majority of opcodes take 0 - 1 operands. This meant that a large portion of the program was a bunch of zeros that really didn’t need to be there. Furthermore, the VM only has about 80 opcodes, so we can reduce the reserved space for the opcode from 32 bits to 8 bits. To address these size issues we landed on a conservative yet effective 16-bit encoding scheme. The first 8 bits (T) represent the opcode, which allows for 255 possibilities, followed by 2-bits (L) that encode the number of operands, we have purposely left the next 6 bits unused to allow for growth. From there each operand is a 16-bit (A) integer. This ends up looking like the following:
/* Fixed Number of operands varies */
0b000000LLTTTTTTTT, 0bAAAAAAAAAAAAAAAA, 0bAAAAAAAAAAAAAAAA, 0bAAAAAAAAAAAAAAAA
In our experiments, this layout and compaction reduces the program size by more than 50%. The “decoding” of this layout has very little overhead as we are simply masking and shifting bits to figure out the opcode and operand length.
The second optimization had to do with interning the constants pool. This effectively means that there is a single representation of the constant. If we were to expand the example above to add another h1 we would simply hand back 1 instead of pushing the string and handing back a new address. In a typical application, you potentially have hundreds of repeated strings that add up in size.
With these two optimizations alone templates have become a zero cost abstraction in terms of bytes sent to the client. In other words the template that you write is compiled into code that is the same size or smaller.
Runtime Performance
One of the key reasons for doing this work is that we can completely side-step the browser’s parse and compile pipeline for a large portion of application code. Since the browser sees ArrayBuffers as just a fixed size block of raw memory, the binary program can be handed directly to native libraries instead of parsing and compiling into native code like the overwhelming majority of JS frameworks. This allows for the VM to start up quickly and greatly reduce interleaving of execution, parsing, and compiling. While Tom and I are still working on getting these changes into a real application, our stress test showed a 3.5x speedup when using the optimizing compiler over the runtime compiler. It’s also important to note the browser's DOM APIs such as setAttribute and createElement were the highest items on a profile instead of some piece of library code.
A Quick Word On SSR
While a lot of time has been spent working on making the client-side render faster, we have also landed rehydration for server-side rendered applications. Unlike other solutions that do rehydration of a server-rendered application, the Glimmer VM can rehydrate using requestIdleCallback allowing the page to remain interactive. This is enabled by the fact that the program is linear and the VM state is driven forward by an iterator. This allows us to have fine grain control on when to execute and can arbitrary yield back to the main thread. We hope to talk more about this in a future update.
What’s Next
Over the next couple months, we plan to make the optimizing compiler and rehydration the default way to build and run applications that use the Glimmer VM. Once that is done we plan to continue to work on enabling more performance related features like SSR streaming and Web Workers. I’m super excited to roll this work out to Glimmer and Ember applications.