Introduction to ILCompiler.Reflection.ReadyToRun



Introduction

ILCompiler.Reflection.ReadyToRun.dll is a new library I introduced in late 2019. It is meant to make the data embedded in a ReadyToRun binary available for access.

Background

What is ReadyToRun?

ReadyToRun is a new native compilation technology for .NET Core. For people who know about NGEN, it can be roughly thought as the .NET Core version of NGEN.

What is native compilation?

When a managed language (such as C#) is compiled, it is compiled into an intermediate language named MSIL. At runtime, the MSIL is JIT compiled into native instructions for execution. When an application starts up, none of the MSIL is jitted, this can cause a noticable delay in response time. To make this run faster, we can perform the JIT compilation ahead of time and save them somewhere so that we can skip the JIT compilation at start up.

Difference between NGEN and ReadyToRun

To optimize for performance, the JIT takes advantage of the fact that it knows its execution environment. In particular, suppose we are jitting method B, and method B is calling method A. It can take advantage of the fact that the compiled code only need to work with the exact version of A. It can even inline A into B’s body.

This is great at runtime because it is fast, but it is fragile for ahead of time compilation. Suppose method A actually resides in another assembly and we changed that assembly. The natively compiled method B is no longer valid, and we have to invalidate the code. This has a cascading effect. In the old days, if mscorlib.dll is updated, all native code caches are invalidated and need to be recompiled.

To address the problem, ReadyToRun is born. The idea is to introduce a concept of version bubble. Managed assemblies tends to update as a cluster. We allows the native compiler to depends on implementation details of dependencies only if they fall within the same version bubble. That allows individual version bubbles to be updated without propagating to another.

A direct consequence of version bubble is that our developer can perform the ready to run compilation as part of their build before the binary is shipped to their customers. Previously it is impossible because there is no way one can guarantee the framework won’t change. But now it is possible because the native compiled code can exclude the framework as part of its version bubble so the code will withstand changes in the framework.

What is ILCompiler.Reflection.ReadyToRun?

Similar to System.Reflection.Metadata, ILCompiler.Reflection.ReadyToRun is meant to be a library that allows one to read the data inside a ReadyToRun binary. At it’s core, we have a class ReadyToRunReader. This class is the starting point of using the library. With an instance of it, we can start exploring the property on this object. The most interesting property on this object is the Methods, which give us all the compiled method. Within a method, we can start to explore its code and various support data structure such as DebugInfo, EHInfo, UnwindInfo, and so on.

Historical background

ILCompiler.Reflection.ReadyToRun was a product of a refactoring by me in late 2019. The refactoring extracted the binary parsing logic into a library from the presentation logic out of a tool named R2RDump. R2RDump is originally introduced by Amy Yu back in May 2018. The tool was meant for debugging the ready to run compiler. By producing a text dump of the generated code, the compiler developers can inspect and figure out what gone wrong.

Why did I do the refactoring? My goal was to perform some automated validation of the generated data. I could have done by parsing the text output generated by R2RDump, but it would be silly. I had structured data to begin with. To share code with the test harness, I extracted the logic into a separate DLL, that is how ILCompiler.Reflection.ReadyToRun is borned.

Status

As it was meant to be a tool that produce a full text dump, many of its design was tailored towards that goal. For example, a lot of structured objects are currently available only in textual form. A key example of this is the concept of a signature. It is a very rich and recursive data structure, but all one could get from the API is an opaque string except for the really simple case I added recently. Contributions along this line is very welcomed.

As it was meant to be a tool for the compiler team to debug, the output data may contain information that is hard to interpret. In the coming posts in this series, I will explain some more how these data should be interpreted. Stay tuned.