Just-in-time compilation

Table of Contents

1. JIT 简介

In computing, just-in-time (JIT) compilation, also known as dynamic translation, is compilation done during execution of a program – at run time – rather than prior to execution. Most often this consists of translation to machine code, which is then executed directly, but can also refer to translation to another format.

1.1. JIT 历史

Self-modifying code has existed since the earliest days of computing, but we exclude that from consideration because there is typically no compilation or translation aspect involved.
Instead, we suspect that the earliest published work on JIT compilation was McCarthy’s LISP paper. He mentioned compilation of functions into machine language, a process fast enough that the compiler’s output needn’t be saved.
This can be seen as an inevitable result of having programs and data share the same notation [McCarthy 1981].
Another early published reference to JIT compilation dates back to 1966. The University of Michigan Executive System for the IBM 7090 explicitly notes that the assembler [University of Michigan 1966b,p. 1] and loader [University of Michigan 1966a, p. 6] can be used to translate and load during execution. (The manual’s preface says that most sections were written before August 1965, so this likely dates back further.)
Thompson’s paper, published in Communications of the ACM, is frequently cited as "early work" in modern publications. He compiled regular expressions into IBM 7094 code in an ad hoc fashion, code which was then executed to perform matching.

摘自: A brief history of Just-In-Time

1.2. JIT 优势

Advantages over compiled programs:

  • Typically smaller in size
  • More portable
  • Access to run-time information

Advantages over interpreted programs:

  • Faster execution

摘自:https://www.cs.duke.edu/courses/spring10/cps296.1/lectures/17-JIT.pdf

1.3. JIT 步骤

JIT 技术可分为两步:
第 1 步:在运行时产生机器代码(也可以是某种虚拟机指令)
第 2 步:在运行时执行上一步新产生的代码。

说明:JIT 和动态语言中的 eval 方法(把字符串当代码执行)有点类似。

2. JIT 实例

下面是一个简单的 JIT 实例:先申请一块可执行的内存,再往往内存里写入 x86-64 机器代码,最后执行这段机器代码。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

// Allocates RWX memory of given size and returns a pointer to it. On failure,
// prints out the error and returns NULL.
void* alloc_executable_memory(size_t size) {
  void* ptr = mmap(0, size,
                   PROT_READ | PROT_WRITE | PROT_EXEC,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (ptr == (void*)-1) {
    perror("mmap");
    return NULL;
  }
  return ptr;
}

void emit_code_into_memory(unsigned char* m) {
  /* 下面这段x86-64汇编代码实现的功能是:参数加4 */
  unsigned char code[] = {
    0x48, 0x89, 0xf8,                   // mov %rdi, %rax
    0x48, 0x83, 0xc0, 0x04,             // add $4, %rax
    0xc3                                // ret
  };
  memcpy(m, code, sizeof(code));
}

typedef long (*JittedFunc)(long);

int main() {
  void* m = alloc_executable_memory(1024);
  emit_code_into_memory(m);

  JittedFunc func = m;
  int result = func(2);
  printf("result = %d\n", result);

  result = func(3);
  printf("result = %d\n", result);

  return 0;
}

在 x86-64 Linux 中测试上面程序,可得到下面输出:

$ ./test
result = 6
result = 7

上面例子摘自:How to JIT - an introduction, by Eli Bendersky

3. JIT 库

从前面例子中可知,实现 JIT 时手工编写机器代码容易出错,最好的有一个辅助的库,已经有一些现成的 JIT 库了,如 Mozilla 的 Nanojit、LuaJIT 的 DynASM(如 agentzh 的正则引擎 sregex,就使用了 DynASM)等等。

编译器框架系统 LLVM 中也有 JIT 相关库,且目前已经进化到了第三代:第一代 Legacy JIT(已被删除),第二代 MCJIT,第三代 On-Request-Compilation (ORC) JIT。

Author: cig01

Created: <2014-11-03 Mon>

Last updated: <2020-06-26 Fri>

Creator: Emacs 27.1 (Org mode 9.4)