Just-in-time compilation

Table of Contents

1 JIT简介

In computing, just-in-time (JIT) compilation, also known as dynamic translation, is compilation done during execution of a program – at run time – rather than prior to execution. Most often this consists of translation to machine code, which is then executed directly, but can also refer to translation to another format.

1.1 JIT历史

Self-modifying code has existed since the earliest days of computing, but we exclude that from consideration because there is typically no compilation or translation aspect involved.
Instead, we suspect that the earliest published work on JIT compilation was McCarthy’s LISP paper. He mentioned compilation of functions into machine language, a process fast enough that the compiler’s output needn’t be saved.
This can be seen as an inevitable result of having programs and data share the same notation [McCarthy 1981].
Another early published reference to JIT compilation dates back to 1966. The University of Michigan Executive System for the IBM 7090 explicitly notes that the assembler [University of Michigan 1966b,p. 1] and loader [University of Michigan 1966a, p. 6] can be used to translate and load during execution. (The manual’s preface says that most sections were written before August 1965, so this likely dates back further.)
Thompson’s paper, published in Communications of the ACM, is frequently cited as "early work" in modern publications. He compiled regular expressions into IBM 7094 code in an ad hoc fashion, code which was then executed to perform matching.

摘自: A brief history of Just-In-Time

1.2 JIT优势

Advantages over compiled programs:

  • Typically smaller in size
  • More portable
  • Access to run-time information

Advantages over interpreted programs:

  • Faster execution

摘自:https://www.cs.duke.edu/courses/spring10/cps296.1/lectures/17-JIT.pdf

1.3 JIT步骤

JIT技术可分为两步:
第1步:在运行时产生机器代码(也可以是某种虚拟机指令)
第2步:在运行时执行上一步新产生的代码。

说明:JIT和动态语言中的eval方法(把字符串当代码执行)有点类似。

2 JIT实例

下面是一个简单的JIT实例:先申请一块可执行的内存,再往往内存里写入x86-64机器代码,最后执行这段机器代码。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

// Allocates RWX memory of given size and returns a pointer to it. On failure,
// prints out the error and returns NULL.
void* alloc_executable_memory(size_t size) {
  void* ptr = mmap(0, size,
                   PROT_READ | PROT_WRITE | PROT_EXEC,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (ptr == (void*)-1) {
    perror("mmap");
    return NULL;
  }
  return ptr;
}

void emit_code_into_memory(unsigned char* m) {
  /* 下面这段x86-64汇编代码实现的功能是:参数加4 */
  unsigned char code[] = {
    0x48, 0x89, 0xf8,                   // mov %rdi, %rax
    0x48, 0x83, 0xc0, 0x04,             // add $4, %rax
    0xc3                                // ret
  };
  memcpy(m, code, sizeof(code));
}

typedef long (*JittedFunc)(long);

void main() {
  void* m = alloc_executable_memory(1024);
  emit_code_into_memory(m);

  JittedFunc func = m;
  int result = func(2);
  printf("result = %d\n", result);

  result = func(3);
  printf("result = %d\n", result);

  return 0;
}

在x86-64 Linux中测试上面程序,可得到下面输出:

$ ./test
result = 6
result = 7

上面例子摘自:http://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introduction/

3 JIT库

从前面例子中可知,实现JIT时手工编写机器代码容易出错,最好的有一个辅助的库,已经有一些现成的JIT库了,如Mozilla的 Nanojit 以及LuaJIT的 DynASM 等等。

如agentzh的正则引擎sregex,就使用了DynASM。参考:https://github.com/openresty/sregex


Author: cig01

Created: <2014-11-03 Mon 00:00>

Last updated: <2018-04-28 Sat 22:52>

Creator: Emacs 25.3.1 (Org mode 9.1.4)