System call

Table of Contents

1. 系统调用简介

In computing, a system call is how a program requests a service from an operating system's kernel.

The interface between an application program and the Operating System is through system calls.

The operating system is responsible for

  • Process Management (starting, running, stopping processes)
  • File Management(creating, opening, closing, reading, writing, renaming files)
  • Memory Management (allocating, deallocating memory)
  • Other stuff (timing, scheduling, network management)

An application program makes a system call to get the operating system to perform a service for it, like reading from a file.

参考:
http://cs.lmu.edu/~ray/notes/linuxsyscalls/

System V Application Binary Interface: Intel386 Architecture Processor Supplement. Describes the data representation, register usage, stack management, and function-calling sequence the System V ABI uses in the IA-32 architecture. This document is located at http://www.sco.com/developers/devspecs/abi386-4.pdf.

System V Application Binary Interface AMD64 Architecture Processor Supplement, found at http://www.x86-64.org/documentation/abi.pdf.

1.1. 使用系统调用

怎么使用系统调用?
一、传统方式
i386 平台中,系统调用一般通过软中断来实现(调用功能号一般放入 eax 寄存器中),Linux 中为 int 0x80 ,Windows 中为 int 0x2e

在保护模式中,处理 INT 中断指令时,CPU 首先从中断描述表 IDT 取出对应的门描述符,判断门描述符的种类,然后检查门描述符的级别 DPL 和 INT 指令调用者的级别 CPL,当 CPL<=DPL 也就是说 INT 调用者级别高于描述符指定级别时,才能成功调用,最后再根据描述符的内容,进行压栈、跳转、权限级别提升。内核代码执行完毕之后,调用 IRET 指令返回,IRET 指令恢复用户栈,并跳转会低级别的代码。

二、新方式
在发生系统调用,由 Ring3 进入 Ring0 的这个过程浪费了不少的 CPU 周期,例如,系统调用必然需要由 Ring3 进入 Ring0(由内核调用 INT 指令的方式除外,这多半属于 Hacker 的内核模块所为),权限提升之前和之后的级别是固定的,CPL 肯定是 3,而 INT 80 的 DPL 肯定也是 3,这样 CPU 检查门描述符的 DPL 和调用者的 CPL 就是完全没必要。正是由于如此,Intel x86 CPU 从 PII 300(Family 6,Model 3,Stepping 3)之后,开始支持新的系统调用指令 sysenter/sysexit。sysenter 指令用于由 Ring3 进入 Ring0,sysexit 指令用于由 Ring0 返回 Ring3。由于没有特权级别检查的处理,也没有压栈的操作,所以执行速度比 INT n/IRET 快了不少。

除 Intel 的 sysenter/sysexit 外,AMD 也有类型的快速系统调用指令:syscall/sysret

参考:
Linux 2.6 对新型 CPU 快速系统调用的支持:http://www.ibm.com/developerworks/cn/linux/kernel/l-k26ncpu/index.html

2. vsyscall 和 vdso (Virtual Dynamically-lined Shared)

vsyscall 和 vdso (Virtual Dynamically-lined Shared) 是 Linux 中加速系统调用的两个机制。

执行下面命令查看内存映像时,除了我们熟悉的 heap 段和 stack 段外,还有 vdso 段和 vsyscall 段。

$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 08:05 465620                             /bin/cat
0060a000-0060b000 r--p 0000a000 08:05 465620                             /bin/cat
0060b000-0060c000 rw-p 0000b000 08:05 465620                             /bin/cat
00c2e000-00c4f000 rw-p 00000000 00:00 0                                  [heap]
7fe11d121000-7fe11d808000 r--p 00000000 08:05 267551                     /usr/lib/locale/locale-archive
7fe11d808000-7fe11d9bd000 r-xp 00000000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe11d9bd000-7fe11dbbd000 ---p 001b5000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe11dbbd000-7fe11dbc1000 r--p 001b5000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe11dbc1000-7fe11dbc3000 rw-p 001b9000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7fe11dbc3000-7fe11dbc8000 rw-p 00000000 00:00 0 
7fe11dbc8000-7fe11dbea000 r-xp 00000000 08:05 131211                     /lib/x86_64-linux-gnu/ld-2.15.so
7fe11ddcf000-7fe11ddd2000 rw-p 00000000 00:00 0 
7fe11dde8000-7fe11ddea000 rw-p 00000000 00:00 0 
7fe11ddea000-7fe11ddeb000 r--p 00022000 08:05 131211                     /lib/x86_64-linux-gnu/ld-2.15.so
7fe11ddeb000-7fe11dded000 rw-p 00023000 08:05 131211                     /lib/x86_64-linux-gnu/ld-2.15.so
7fff511a7000-7fff511c8000 rw-p 00000000 00:00 0                          [stack]
7fff511ce000-7fff511cf000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

vdso 是一个由内核提供的虚拟.so 文件,它不在磁盘上,而在内核里,内核将其映射到一个地址空间中,被所有程序共享,正文段大小为一个页面,它将内核态的调用映射到用户态的地址空间中,使得调用开销更小。

参考:http://lwn.net/Articles/446528/

2.1. vsyscall 和 vDSO 的由来

The vsyscall and vDSO segments are two mechanisms used to accelerate certain system calls in Linux. For instance, gettimeoftheday is usually invoked through this mechanism. The first mechanism introduced was vsyscall, which was added as a way to execute specific system calls which do not need any real level of privilege to run in order to reduce the system call overhead. Following the previous example, all gettimeofday needs to do is to read the kernel's the current time. There are applications that call gettimeofday frequently (e.g to generate timestamps), to the point that they care about even a little bit of overhead. To address this concern, the kernel maps into user space a page containing the current time and a fast gettimeofday implementation (i.e. just a function which reads the time saved into vsyscall). Using this virtual system call, the C library can provide a fast gettimeofday which does not have the overhead introduced by the context switch between kernel space and user space usually introduced by the classic system call model INT 0x80 or SYSCALL.

However, this vsyscall mechanism has some limitations: the memory allocated is small and allows only 4 system calls, and, more important and serious, the vsyscall page is statically allocated to the same address in each process, since the location of the vsyscall page is nailed down in the kernel ABI. This static allocation of the vsyscall compromises the benefit introduced by the memory space randomisation commonly used by Linux. An attacker, after compromising an application by exploiting a stack-overflow, can invoke a system call from the vsyscall page with arbitrary parameters. All he needs is the address of the system call, which is easily predicable as it is statically allocated (if you try to run again your command even with different applications, you'll notice that the address of the vsyscall does not change). It would be nice to remove or at least randomize the location of the vsyscall page to thwart this type of attack. Unfortunately, applications depend on the existence and exact address of that page, so nothing can be done.

This security issue has been addressed by replacing all system call instructions at fixed addresses by a special trap instruction. An application trying to call into the vsyscall page will trap into the kernel, which will then emulate the desired virtual system call in kernel space. The result is a kernel system call emulating a virtual system call which was put there to avoid the kernel system call in the first place. The result is a vsyscall which takes longer to execute but, crucially, does not break the existing ABI. In any case, the slowdown will only be seen if the application is trying to use the vsyscall page instead of the vDSO.

The vDSO offers the same functionality as the vsyscall, while overcoming its limitations. The vDSO (Virtual Dynamically linked Shared Objects) is a memory area allocated in user space which exposes some kernel functionalities at user space in a safe manner. This has been introduced to solve the security threats caused by the vsyscall. The vDSO is dynamically allocated which solves security concerns and can have more than 4 system calls. The vDSO links are provided via the glibc library. The linker will link in the glibc vDSO functionality, provided that such a routine has an accompanying vDSO version, such as gettimeofday. When your program executes, if your kernel does not have vDSO support, a traditional syscall will be made.

参考:
http://stackoverflow.com/questions/19938324/what-are-vdso-and-vsyscall

2.2. vdso 段中到底有什么?

首先,关闭地址空间的随机化功能,这样 vdso 的地址会不变!便于分析。

$ sudo sysctl -w kernel.randomize_va_space=0

$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 08:05 465620                             /bin/cat
0060a000-0060b000 r--p 0000a000 08:05 465620                             /bin/cat
0060b000-0060c000 rw-p 0000b000 08:05 465620                             /bin/cat
0060c000-0062d000 rw-p 00000000 00:00 0                                  [heap]
7ffff7333000-7ffff7a1a000 r--p 00000000 08:05 267551                     /usr/lib/locale/locale-archive
7ffff7a1a000-7ffff7bcf000 r-xp 00000000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7bcf000-7ffff7dcf000 ---p 001b5000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7dcf000-7ffff7dd3000 r--p 001b5000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7dd3000-7ffff7dd5000 rw-p 001b9000 08:05 131188                     /lib/x86_64-linux-gnu/libc-2.15.so
7ffff7dd5000-7ffff7dda000 rw-p 00000000 00:00 0 
7ffff7dda000-7ffff7dfc000 r-xp 00000000 08:05 131211                     /lib/x86_64-linux-gnu/ld-2.15.so
7ffff7fe0000-7ffff7fe3000 rw-p 00000000 00:00 0 
7ffff7ff9000-7ffff7ffb000 rw-p 00000000 00:00 0 
7ffff7ffb000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00022000 08:05 131211                     /lib/x86_64-linux-gnu/ld-2.15.so
7ffff7ffd000-7ffff7fff000 rw-p 00023000 08:05 131211                     /lib/x86_64-linux-gnu/ld-2.15.so
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

$ dd if=/proc/self/mem of=vdso.so bs=4096 skip=$[0x7ffff7ffb] count=1
dd: ‘/proc/self/mem’: cannot skip to specified offset
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000443401 s, 9.2 MB/s

$ file vdso.so
vdso.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x2091c544164d1d49d969d40b727cdc0e2ea344a5, stripped

$ objdump -d vdso.so | grep ">:"
0000000000000970 <__vdso_clock_gettime>:
0000000000000ee0 <__vdso_gettimeofday>:
00000000000011a0 <__vdso_time>:
00000000000011c0 <__vdso_getcpu>:

由上面的输出可知只有__vdso_clock_gettime,__vdso_gettimeofday,__vdso_time,__vdso_getcpu 的相关代码。
所以仅在调用 clock_gettime、gettimeofday、getcpu、time 等这些系统调用时,才会使用 vdso 快速调用机制,其他系统调用还是直接用指令 syscall。

参考:
http://blog.csdn.net/wlp600/article/details/6886162

3. Architecture calling conventions

Every architecture has its own way of invoking and passing arguments to the kernel. The details for various architectures are listed in the two tables below.

Table 1: The instruction used to transition to kernel mode
arch/ABI instruction syscall # retval Notes
arm/OABI swi NR - a1 NR is syscall #
arm/EABI swi 0x0 r7 r0  
blackfin excpt 0x0 P0 R0  
i386 int $0x80 eax eax  
ia64 break 0x100000 r15 r10/r8 bool error/errno value
parisc ble 0x100(%sr2, %r0) r20 r28  
s390 svc 0 r1 r2  
sparc/32 t 0x10 g1 o0  
sparc/64 t 0x6d g1 o0  
x86_64 syscall rax rax  
Table 2: The registers used to pass the system call arguments
arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7
arm/OABI a1 a2 a3 a4 v1 v2 v3
arm/EABI r0 r1 r2 r3 r4 r5 r6
blackfin R0 R1 R2 R3 R4 R5 -
i386 ebx ecx edx esi edi ebp -
ia64 out0 out1 out2 out3 out4 out5 -
parisc r26 r25 r24 r23 r22 r21 -
s390 r2 r3 r4 r5 r6 r7 -
sparc/32 o0 o1 o2 o3 o4 o5 -
sparc/64 o0 o1 o2 o3 o4 o5 -
x86_64 rdi rsi rdx r10 r8 r9 -

参考:
man syscall in Debian
man syscalls in Debian

4. Linux system call table for i386

Linux system call table for i386 如表 3 所示。

Table 3: Linux system call table for i386
%eax Name Source %ebx %ecx %edx %esi %edi
1 sys_exit kernel/exit.c int - - - -
2 sys_fork arch/i386/kernel/process.c struct pt_regs - - - -
3 sys_read fs/read_write.c unsigned int char * size_t - -
4 sys_write fs/read_write.c unsigned int const char * size_t - -
5 sys_open fs/open.c const char * int int - -
6 sys_close fs/open.c unsigned int - - - -
7 sys_waitpid kernel/exit.c pid_t unsigned int * int - -
8 sys_creat fs/open.c const char * int - - -
9 sys_link fs/namei.c const char * const char * - - -
10 sys_unlink fs/namei.c const char * - - - -
11 sys_execve arch/i386/kernel/process.c struct pt_regs - - - -
12 sys_chdir fs/open.c const char * - - - -
13 sys_time kernel/time.c int * - - - -
14 sys_mknod fs/namei.c const char * int dev_t - -
15 sys_chmod fs/open.c const char * mode_t - - -
16 sys_lchown fs/open.c const char * uid_t gid_t - -
18 sys_stat fs/stat.c char * struct __old_kernel_stat * - - -
19 sys_lseek fs/read_write.c unsigned int off_t unsigned int - -
20 sys_getpid kernel/sched.c - - - - -
21 sys_mount fs/super.c char * char * char * - -
22 sys_oldumount fs/super.c char * - - - -
23 sys_setuid kernel/sys.c uid_t - - - -
24 sys_getuid kernel/sched.c - - - - -
25 sys_stime kernel/time.c int * - - - -
26 sys_ptrace arch/i386/kernel/ptrace.c long long long long -
27 sys_alarm kernel/sched.c unsigned int - - - -
28 sys_fstat fs/stat.c unsigned int struct __old_kernel_stat * - - -
29 sys_pause arch/i386/kernel/sys_i386.c - - - - -
30 sys_utime fs/open.c char * struct utimbuf * - - -
33 sys_access fs/open.c const char * int - - -
34 sys_nice kernel/sched.c int - - - -
36 sys_sync fs/buffer.c - - - - -
37 sys_kill kernel/signal.c int int - - -
38 sys_rename fs/namei.c const char * const char * - - -
39 sys_mkdir fs/namei.c const char * int - - -
40 sys_rmdir fs/namei.c const char * - - - -
41 sys_dup fs/fcntl.c unsigned int - - - -
42 sys_pipe arch/i386/kernel/sys_i386.c unsigned long * - - - -
43 sys_times kernel/sys.c struct tms * - - - -
45 sys_brk mm/mmap.c unsigned long - - - -
46 sys_setgid kernel/sys.c gid_t - - - -
47 sys_getgid kernel/sched.c - - - - -
48 sys_signal kernel/signal.c int __sighandler_t - - -
49 sys_geteuid kernel/sched.c - - - - -
50 sys_getegid kernel/sched.c - - - - -
51 sys_acct kernel/acct.c const char * - - - -
52 sys_umount fs/super.c char * int - - -
54 sys_ioctl fs/ioctl.c unsigned int unsigned int unsigned long - -
55 sys_fcntl fs/fcntl.c unsigned int unsigned int unsigned long - -
57 sys_setpgid kernel/sys.c pid_t pid_t - - -
59 sys_olduname arch/i386/kernel/sys_i386.c struct oldold_utsname * - - - -
60 sys_umask kernel/sys.c int - - - -
61 sys_chroot fs/open.c const char * - - - -
62 sys_ustat fs/super.c dev_t struct ustat * - - -
63 sys_dup2 fs/fcntl.c unsigned int unsigned int - - -
64 sys_getppid kernel/sched.c - - - - -
65 sys_getpgrp kernel/sys.c - - - - -
66 sys_setsid kernel/sys.c - - - - -
67 sys_sigaction arch/i386/kernel/signal.c int const struct old_sigaction * struct old_sigaction * - -
68 sys_sgetmask kernel/signal.c - - - - -
69 sys_ssetmask kernel/signal.c int - - - -
70 sys_setreuid kernel/sys.c uid_t uid_t - - -
71 sys_setregid kernel/sys.c gid_t gid_t - - -
72 sys_sigsuspend arch/i386/kernel/signal.c int int old_sigset_t - -
73 sys_sigpending kernel/signal.c old_sigset_t * - - - -
74 sys_sethostname kernel/sys.c char * int - - -
75 sys_setrlimit kernel/sys.c unsigned int struct rlimit * - - -
76 sys_getrlimit kernel/sys.c unsigned int struct rlimit * - - -
77 sys_getrusage kernel/sys.c int struct rusage * - - -
78 sys_gettimeofday kernel/time.c struct timeval * struct timezone * - - -
79 sys_settimeofday kernel/time.c struct timeval * struct timezone * - - -
80 sys_getgroups kernel/sys.c int gid_t * - - -
81 sys_setgroups kernel/sys.c int gid_t * - - -
82 old_select arch/i386/kernel/sys_i386.c struct sel_arg_struct * - - - -
83 sys_symlink fs/namei.c const char * const char * - - -
84 sys_lstat fs/stat.c char * struct __old_kernel_stat * - - -
85 sys_readlink fs/stat.c const char * char * int - -
86 sys_uselib fs/exec.c const char * - - - -
87 sys_swapon mm/swapfile.c const char * int - - -
88 sys_reboot kernel/sys.c int int int void * -
89 old_readdir fs/readdir.c unsigned int void * unsigned int - -
90 old_mmap arch/i386/kernel/sys_i386.c struct mmap_arg_struct * - - - -
91 sys_munmap mm/mmap.c unsigned long size_t - - -
92 sys_truncate fs/open.c const char * unsigned long - - -
93 sys_ftruncate fs/open.c unsigned int unsigned long - - -
94 sys_fchmod fs/open.c unsigned int mode_t - - -
95 sys_fchown fs/open.c unsigned int uid_t gid_t - -
96 sys_getpriority kernel/sys.c int int - - -
97 sys_setpriority kernel/sys.c int int int - -
99 sys_statfs fs/open.c const char * struct statfs * - - -
100 sys_fstatfs fs/open.c unsigned int struct statfs * - - -
101 sys_ioperm arch/i386/kernel/ioport.c unsigned long unsigned long int - -
102 sys_socketcall net/socket.c int unsigned long * - - -
103 sys_syslog kernel/printk.c int char * int - -
104 sys_setitimer kernel/itimer.c int struct itimerval * struct itimerval * - -
105 sys_getitimer kernel/itimer.c int struct itimerval * - - -
106 sys_newstat fs/stat.c char * struct stat * - - -
107 sys_newlstat fs/stat.c char * struct stat * - - -
108 sys_newfstat fs/stat.c unsigned int struct stat * - - -
109 sys_uname arch/i386/kernel/sys_i386.c struct old_utsname * - - - -
110 sys_iopl arch/i386/kernel/ioport.c unsigned long - - - -
111 sys_vhangup fs/open.c - - - - -
112 sys_idle arch/i386/kernel/process.c - - - - -
113 sys_vm86old arch/i386/kernel/vm86.c unsigned long struct vm86plus_struct * - - -
114 sys_wait4 kernel/exit.c pid_t unsigned long * int options struct rusage * -
115 sys_swapoff mm/swapfile.c const char * - - - -
116 sys_sysinfo kernel/info.c struct sysinfo * - - - -
117 sys_ipc (*Note) arch/i386/kernel/sys_i386.c uint int int int void *
118 sys_fsync fs/buffer.c unsigned int - - - -
119 sys_sigreturn arch/i386/kernel/signal.c unsigned long - - - -
120 sys_clone arch/i386/kernel/process.c struct pt_regs - - - -
121 sys_setdomainname kernel/sys.c char * int - - -
122 sys_newuname kernel/sys.c struct new_utsname * - - - -
123 sys_modify_ldt arch/i386/kernel/ldt.c int void * unsigned long - -
124 sys_adjtimex kernel/time.c struct timex * - - - -
125 sys_mprotect mm/mprotect.c unsigned long size_t unsigned long - -
126 sys_sigprocmask kernel/signal.c int old_sigset_t * old_sigset_t * - -
127 sys_create_module kernel/module.c const char * size_t - - -
128 sys_init_module kernel/module.c const char * struct module * - - -
129 sys_delete_module kernel/module.c const char * - - - -
130 sys_get_kernel_syms kernel/module.c struct kernel_sym * - - - -
131 sys_quotactl fs/dquot.c int const char * int caddr_t -
132 sys_getpgid kernel/sys.c pid_t - - - -
133 sys_fchdir fs/open.c unsigned int - - - -
134 sys_bdflush fs/buffer.c int long - - -
135 sys_sysfs fs/super.c int unsigned long unsigned long - -
136 sys_personality kernel/exec_domain.c unsigned long - - - -
138 sys_setfsuid kernel/sys.c uid_t - - - -
139 sys_setfsgid kernel/sys.c gid_t - - - -
140 sys_llseek fs/read_write.c unsigned int unsigned long unsigned long loff_t * unsigned int
141 sys_getdents fs/readdir.c unsigned int void * unsigned int - -
142 sys_select fs/select.c int fd_set * fd_set * fd_set * struct timeval *
143 sys_flock fs/locks.c unsigned int unsigned int - - -
144 sys_msync mm/filemap.c unsigned long size_t int - -
145 sys_readv fs/read_write.c unsigned long const struct iovec * unsigned long - -
146 sys_writev fs/read_write.c unsigned long const struct iovec * unsigned long - -
147 sys_getsid kernel/sys.c pid_t - - - -
148 sys_fdatasync fs/buffer.c unsigned int - - - -
149 sys_sysctl kernel/sysctl.c struct __sysctl_args * - - - -
150 sys_mlock mm/mlock.c unsigned long size_t - - -
151 sys_munlock mm/mlock.c unsigned long size_t - - -
152 sys_mlockall mm/mlock.c int - - - -
153 sys_munlockall mm/mlock.c - - - - -
154 sys_sched_setparam kernel/sched.c pid_t struct sched_param * - - -
155 sys_sched_getparam kernel/sched.c pid_t struct sched_param * - - -
156 sys_sched_setscheduler kernel/sched.c pid_t int struct sched_param * - -
157 sys_sched_getscheduler kernel/sched.c pid_t - - - -
158 sys_sched_yield kernel/sched.c - - - - -
159 sys_sched_get_priority_max kernel/sched.c int - - - -
160 sys_sched_get_priority_min kernel/sched.c int - - - -
161 sys_sched_rr_get_interval kernel/sched.c pid_t struct timespec * - - -
162 sys_nanosleep kernel/sched.c struct timespec * struct timespec * - - -
163 sys_mremap mm/mremap.c unsigned long unsigned long unsigned long unsigned long -
164 sys_setresuid kernel/sys.c uid_t uid_t uid_t - -
165 sys_getresuid kernel/sys.c uid_t * uid_t * uid_t * - -
166 sys_vm86 arch/i386/kernel/vm86.c struct vm86_struct * - - - -
167 sys_query_module kernel/module.c const char * int char * size_t size_t *
168 sys_poll fs/select.c struct pollfd * unsigned int long - -
169 sys_nfsservctl fs/filesystems.c int void * void * - -
170 sys_setresgid kernel/sys.c gid_t gid_t gid_t - -
171 sys_getresgid kernel/sys.c gid_t * gid_t * gid_t * - -
172 sys_prctl kernel/sys.c int unsigned long unsigned long unsigned long unsigned long
173 sys_rt_sigreturn arch/i386/kernel/signal.c unsigned long - - - -
174 sys_rt_sigaction kernel/signal.c int const struct sigaction * struct sigaction * size_t -
175 sys_rt_sigprocmask kernel/signal.c int sigset_t * sigset_t * size_t -
176 sys_rt_sigpending kernel/signal.c sigset_t * size_t - - -
177 sys_rt_sigtimedwait kernel/signal.c const sigset_t * siginfo_t * const struct timespec * size_t -
178 sys_rt_sigqueueinfo kernel/signal.c int int siginfo_t * - -
179 sys_rt_sigsuspend arch/i386/kernel/signal.c sigset_t * size_t - - -
180 sys_pread fs/read_write.c unsigned int char * size_t loff_t -
181 sys_pwrite fs/read_write.c unsigned int const char * size_t loff_t -
182 sys_chown fs/open.c const char * uid_t gid_t - -
183 sys_getcwd fs/dcache.c char * unsigned long - - -
184 sys_capget kernel/capability.c cap_user_header_t cap_user_data_t - - -
185 sys_capset kernel/capability.c cap_user_header_t const cap_user_data_t - - -
186 sys_sigaltstack arch/i386/kernel/signal.c const stack_t * stack_t * - - -
187 sys_sendfile mm/filemap.c int int off_t * size_t -
190 sys_vfork arch/i386/kernel/process.c struct pt_regs - - - -

For the numbers of the syscalls, look in arch/i386/kernel/entry.S for sys_call_table. The syscall numbers are offsets into that table. Several spots in the table are occupied by the syscall sys_ni_syscall. This is a placeholder that either replaces an obsolete syscall or reserves a spot for future syscalls.

参考:
http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html

5. Linux system call table for x86_64

Linux system call table for x86_64 如表 4 所示。

Table 4: Linux system call table for x86_64
%rax System call %rdi %rsi %rdx %r10 %r8 %r9
0 sys_read unsigned int fd char *buf size_t count      
1 sys_write unsigned int fd const char *buf size_t count      
2 sys_open const char *filename int flags int mode      
3 sys_close unsigned int fd          
4 sys_stat const char *filename struct stat *statbuf        
5 sys_fstat unsigned int fd struct stat *statbuf        
6 sys_lstat fconst char *filename struct stat *statbuf        
7 sys_poll struct poll_fd *ufds unsigned int nfds long timeout_msecs      
8 sys_lseek unsigned int fd off_t offset unsigned int origin      
9 sys_mmap unsigned long addr unsigned long len unsigned long prot unsigned long flags unsigned long fd unsigned long off
10 sys_mprotect unsigned long start size_t len unsigned long prot      
11 sys_munmap unsigned long addr size_t len        
12 sys_brk unsigned long brk          
13 sys_rt_sigaction int sig const struct sigaction *act struct sigaction *oact size_t sigsetsize    
14 sys_rt_sigprocmask int how sigset_t *nset sigset_t *oset size_t sigsetsize    
15 sys_rt_sigreturn unsigned long __unused          
16 sys_ioctl unsigned int fd unsigned int cmd unsigned long arg      
17 sys_pread64 unsigned long fd char *buf size_t count loff_t pos    
18 sys_pwrite64 unsigned int fd const char *buf size_t count loff_t pos    
19 sys_readv unsigned long fd const struct iovec *vec unsigned long vlen      
20 sys_writev unsigned long fd const struct iovec *vec unsigned long vlen      
21 sys_access const char *filename int mode        
22 sys_pipe int *filedes          
23 sys_select int n fd_set *inp fd_set *outp fd_set*exp struct timeval *tvp  
24 sys_sched_yield            
25 sys_mremap unsigned long addr unsigned long old_len unsigned long new_len unsigned long flags unsigned long new_addr  
26 sys_msync unsigned long start size_t len int flags      
27 sys_mincore unsigned long start size_t len unsigned char *vec      
28 sys_madvise unsigned long start size_t len_in int behavior      
29 sys_shmget key_t key size_t size int shmflg      
30 sys_shmat int shmid char *shmaddr int shmflg      
31 sys_shmctl int shmid int cmd struct shmid_ds *buf      
32 sys_dup unsigned int fildes          
33 sys_dup2 unsigned int oldfd unsigned int newfd        
34 sys_pause            
35 sys_nanosleep struct timespec *rqtp struct timespec *rmtp        
36 sys_getitimer int which struct itimerval *value        
37 sys_alarm unsigned int seconds          
38 sys_setitimer int which struct itimerval *value struct itimerval *ovalue      
39 sys_getpid            
40 sys_sendfile int out_fd int in_fd off_t *offset size_t count    
41 sys_socket int family int type int protocol      
42 sys_connect int fd struct sockaddr *uservaddr int addrlen      
43 sys_accept int fd struct sockaddr *upeer_sockaddr int *upeer_addrlen      
44 sys_sendto int fd void *buff size_t len unsigned flags struct sockaddr *addr int addr_len
45 sys_recvfrom int fd void *ubuf size_t size unsigned flags struct sockaddr *addr int *addr_len
46 sys_sendmsg int fd struct msghdr *msg unsigned flags      
47 sys_recvmsg int fd struct msghdr *msg unsigned int flags      
48 sys_shutdown int fd int how        
49 sys_bind int fd struct sokaddr *umyaddr int addrlen      
50 sys_listen int fd int backlog        
51 sys_getsockname int fd struct sockaddr *usockaddr int *usockaddr_len      
52 sys_getpeername int fd struct sockaddr *usockaddr int *usockaddr_len      
53 sys_socketpair int family int type int protocol int *usockvec    
54 sys_setsockopt int fd int level int optname char *optval int optlen  
55 sys_getsockopt int fd int level int optname char *optval int *optlen  
56 sys_clone unsigned long clone_flags unsigned long newsp void *parent_tid void *child_tid    
57 sys_fork            
58 sys_vfork            
59 sys_execve const char *filename const char *const argv[] const char *const envp[]      
60 sys_exit int error_code          
61 sys_wait4 pid_t upid int *stat_addr int options struct rusage *ru    
62 sys_kill pid_t pid int sig        
63 sys_uname struct old_utsname *name          
64 sys_semget key_t key int nsems int semflg      
65 sys_semop int semid struct sembuf *tsops unsigned nsops      
66 sys_semctl int semid int semnum int cmd union semun arg    
67 sys_shmdt char *shmaddr          
68 sys_msgget key_t key int msgflg        
69 sys_msgsnd int msqid struct msgbuf *msgp size_t msgsz int msgflg    
70 sys_msgrcv int msqid struct msgbuf *msgp size_t msgsz long msgtyp int msgflg  
71 sys_msgctl int msqid int cmd struct msqid_ds *buf      
72 sys_fcntl unsigned int fd unsigned int cmd unsigned long arg      
73 sys_flock unsigned int fd unsigned int cmd        
74 sys_fsync unsigned int fd          
75 sys_fdatasync unsigned int fd          
76 sys_truncate const char *path long length        
77 sys_ftruncate unsigned int fd unsigned long length        
78 sys_getdents unsigned int fd struct linux_dirent *dirent unsigned int count      
79 sys_getcwd char *buf unsigned long size        
80 sys_chdir const char *filename          
81 sys_fchdir unsigned int fd          
82 sys_rename const char *oldname const char *newname        
83 sys_mkdir const char *pathname int mode        
84 sys_rmdir const char *pathname          
85 sys_creat const char *pathname int mode        
86 sys_link const char *oldname const char *newname        
87 sys_unlink const char *pathname          
88 sys_symlink const char *oldname const char *newname        
89 sys_readlink const char *path char *buf int bufsiz      
90 sys_chmod const char *filename mode_t mode        
91 sys_fchmod unsigned int fd mode_t mode        
92 sys_chown const char *filename uid_t user git_t group      
93 sys_fchown unsigned int fd uid_t user git_t group      
94 sys_lchown const char *filename uid_t user git_t group      
95 sys_umask int mask          
96 sys_gettimeofday struct timeval *tv struct timezone *tz        
97 sys_getrlimit unsigned int resource struct rlimit *rlim        
98 sys_getrusage int who struct rusage *ru        
99 sys_sysinfo struct sysinfo *info          
100 sys_times struct sysinfo *info          
101 sys_ptrace long request long pid unsigned long addr unsigned long data    
102 sys_getuid            
103 sys_syslog int type char *buf int len      
104 sys_getgid            
105 sys_setuid uid_t uid          
106 sys_setgid git_t gid          
107 sys_geteuid            
108 sys_getegid            
109 sys_setpgid pid_t pid pid_t pgid        
110 sys_getppid            
111 sys_getpgrp            
112 sys_setsid            
113 sys_setreuid uid_t ruid uid_t euid        
114 sys_setregid git_t rgid gid_t egid        
115 sys_getgroups int gidsetsize gid_t *grouplist        
116 sys_setgroups int gidsetsize gid_t *grouplist        
117 sys_setresuid uid_t *ruid uid_t *euid uid_t *suid      
118 sys_getresuid uid_t *ruid uid_t *euid uid_t *suid      
119 sys_setresgid gid_t rgid gid_t egid gid_t sgid      
120 sys_getresgid git_t *rgid git_t *egid git_t *sgid      
121 sys_getpgid pid_t pid          
122 sys_setfsuid uid_t uid          
123 sys_setfsgid gid_t gid          
124 sys_getsid pid_t pid          
125 sys_capget cap_user_header_t header cap_user_data_t dataptr        
126 sys_capset cap_user_header_t header const cap_user_data_t data        
127 sys_rt_sigpending sigset_t *set size_t sigsetsize        
128 sys_rt_sigtimedwait const sigset_t *uthese siginfo_t *uinfo const struct timespec *uts size_t sigsetsize    
129 sys_rt_sigqueueinfo pid_t pid int sig siginfo_t *uinfo      
130 sys_rt_sigsuspend sigset_t *unewset size_t sigsetsize        
131 sys_sigaltstack const stack_t *uss stack_t *uoss        
132 sys_utime char *filename struct utimbuf *times        
133 sys_mknod const char *filename int mode unsigned dev      
134 sys_uselib NOT IMPLEMENTED          
135 sys_personality unsigned int personality          
136 sys_ustat unsigned dev struct ustat *ubuf        
137 sys_statfs const char *pathname struct statfs *buf        
138 sys_fstatfs unsigned int fd struct statfs *buf        
139 sys_sysfs int option unsigned long arg1 unsigned long arg2      
140 sys_getpriority int which int who        
141 sys_setpriority int which int who int niceval      
142 sys_sched_setparam pid_t pid struct sched_param *param        
143 sys_sched_getparam pid_t pid struct sched_param *param        
144 sys_sched_setscheduler pid_t pid int policy struct sched_param *param      
145 sys_sched_getscheduler pid_t pid          
146 sys_sched_get_priority_max int policy          
147 sys_sched_get_priority_min int policy          
148 sys_sched_rr_get_interval pid_t pid struct timespec *interval        
149 sys_mlock unsigned long start size_t len        
150 sys_munlock unsigned long start size_t len        
151 sys_mlockall int flags          
152 sys_munlockall            
153 sys_vhangup            
154 sys_modify_ldt int func void *ptr unsigned long bytecount      
155 sys_pivot_root const char *new_root const char *put_old        
156 sys__sysctl struct __sysctl_args *args          
157 sys_prctl int option unsigned long arg2 unsigned long arg3 unsigned long arg4   unsigned long arg5
158 sys_arch_prctl struct task_struct *task int code unsigned long *addr      
159 sys_adjtimex struct timex *txc_p          
160 sys_setrlimit unsigned int resource struct rlimit *rlim        
161 sys_chroot const char *filename          
162 sys_sync            
163 sys_acct const char *name          
164 sys_settimeofday struct timeval *tv struct timezone *tz        
165 sys_mount char *dev_name char *dir_name char *type unsigned long flags void *data  
166 sys_umount2 const char *target int flags        
167 sys_swapon const char *specialfile int swap_flags        
168 sys_swapoff const char *specialfile          
169 sys_reboot int magic1 int magic2 unsigned int cmd void *arg    
170 sys_sethostname char *name int len        
171 sys_setdomainname char *name int len        
172 sys_iopl unsigned int level struct pt_regs *regs        
173 sys_ioperm unsigned long from unsigned long num int turn_on      
174 sys_create_module REMOVED IN Linux 2.6          
175 sys_init_module void *umod unsigned long len const char *uargs      
176 sys_delete_module const chat *name_user unsigned int flags        
177 sys_get_kernel_syms REMOVED IN Linux 2.6          
178 sys_query_module REMOVED IN Linux 2.6          
179 sys_quotactl unsigned int cmd const char *special qid_t id void *addr    
180 sys_nfsservctl NOT IMPLEMENTED          
181 sys_getpmsg NOT IMPLEMENTED          
182 sys_putpmsg NOT IMPLEMENTED          
183 sys_afs_syscall NOT IMPLEMENTED          
184 sys_tuxcall NOT IMPLEMENTED          
185 sys_security NOT IMPLEMENTED          
186 sys_gettid            
187 sys_readahead int fd loff_t offset size_t count      
188 sys_setxattr const char *pathname const char *name const void *value size_t size int flags  
189 sys_lsetxattr const char *pathname const char *name const void *value size_t size int flags  
190 sys_fsetxattr int fd const char *name const void *value size_t size int flags  
191 sys_getxattr const char *pathname const char *name void *value size_t size    
192 sys_lgetxattr const char *pathname const char *name void *value size_t size    
193 sys_fgetxattr int fd const har *name void *value size_t size    
194 sys_listxattr const char *pathname char *list size_t size      
195 sys_llistxattr const char *pathname char *list size_t size      
196 sys_flistxattr int fd char *list size_t size      
197 sys_removexattr const char *pathname const char *name        
198 sys_lremovexattr const char *pathname const char *name        
199 sys_fremovexattr int fd const char *name        
200 sys_tkill pid_t pid ing sig        
201 sys_time time_t *tloc          
202 sys_futex u32 *uaddr int op u32 val struct timespec *utime u32 *uaddr2 u32 val3
203 sys_sched_setaffinity pid_t pid unsigned int len unsigned long *user_mask_ptr      
204 sys_sched_getaffinity pid_t pid unsigned int len unsigned long *user_mask_ptr      
205 sys_set_thread_area NOT IMPLEMENTED. Use arch_prctl          
206 sys_io_setup unsigned nr_events aio_context_t *ctxp        
207 sys_io_destroy aio_context_t ctx          
208 sys_io_getevents aio_context_t ctx_id long min_nr long nr struct io_event *events    
209 sys_io_submit aio_context_t ctx_id long nr struct iocb **iocbpp      
210 sys_io_cancel aio_context_t ctx_id struct iocb *iocb struct io_event *result      
211 sys_get_thread_area NOT IMPLEMENTED. Use arch_prctl          
212 sys_lookup_dcookie u64 cookie64 long buf long len      
213 sys_epoll_create int size          
214 sys_epoll_ctl_old NOT IMPLEMENTED          
215 sys_epoll_wait_old NOT IMPLEMENTED          
216 sys_remap_file_pages unsigned long start unsigned long size unsigned long prot unsigned long pgoff unsigned long flags  
217 sys_getdents64 unsigned int fd struct linux_dirent64 *dirent unsigned int count      
218 sys_set_tid_address int *tidptr          
219 sys_restart_syscall            
220 sys_semtimedop int semid struct sembuf *tsops unsigned nsops const struct timespec *timeout    
221 sys_fadvise64 int fd loff_t offset size_t len int advice    
222 sys_timer_create const clockid_t which_clock struct sigevent *timer_event_spec timer_t *created_timer_id      
223 sys_timer_settime timer_t timer_id int flags const struct itimerspec *new_setting struct itimerspec *old_setting    
224 sys_timer_gettime timer_t timer_id struct itimerspec *setting        
225 sys_timer_getoverrun timer_t timer_id          
226 sys_timer_delete timer_t timer_id          
227 sys_clock_settime const clockid_t which_clock const struct timespec *tp        
228 sys_clock_gettime const clockid_t which_clock struct timespec *tp        
229 sys_clock_getres const clockid_t which_clock struct timespec *tp        
230 sys_clock_nanosleep const clockid_t which_clock int flags const struct timespec *rqtp struct timespec *rmtp    
231 sys_exit_group int error_code          
232 sys_epoll_wait int epfd struct epoll_event *events int maxevents int timeout    
233 sys_epoll_ctl int epfd int op int fd struct epoll_event *event    
234 sys_tgkill pid_t tgid pid_t pid int sig      
235 sys_utimes char *filename struct timeval *utimes        
236 sys_vserver NOT IMPLEMENTED          
237 sys_mbind unsigned long start unsigned long len unsigned long mode unsigned long *nmask unsigned long maxnode unsigned flags
238 sys_set_mempolicy int mode unsigned long *nmask unsigned long maxnode      
239 sys_get_mempolicy int *policy unsigned long *nmask unsigned long maxnode unsigned long addr unsigned long flags  
240 sys_mq_open const char *u_name int oflag mode_t mode struct mq_attr *u_attr    
241 sys_mq_unlink const char *u_name          
242 sys_mq_timedsend mqd_t mqdes const char *u_msg_ptr size_t msg_len unsigned int msg_prio const stuct timespec *u_abs_timeout  
243 sys_mq_timedreceive mqd_t mqdes char *u_msg_ptr size_t msg_len unsigned int *u_msg_prio const struct timespec *u_abs_timeout  
244 sys_mq_notify mqd_t mqdes const struct sigevent *u_notification        
245 sys_mq_getsetattr mqd_t mqdes const struct mq_attr *u_mqstat struct mq_attr *u_omqstat      
246 sys_kexec_load unsigned long entry unsigned long nr_segments struct kexec_segment *segments unsigned long flags    
247 sys_waitid int which pid_t upid struct siginfo *infop int options struct rusage *ru  
248 sys_add_key const char *_type const char *_description const void *_payload size_t plen    
249 sys_request_key const char *_type const char *_description const char *_callout_info key_serial_t destringid    
250 sys_keyctl int option unsigned long arg2 unsigned long arg3 unsigned long arg4 unsigned long arg5  
251 sys_ioprio_set int which int who int ioprio      
252 sys_ioprio_get int which int who        
253 sys_inotify_init            
254 sys_inotify_add_watch int fd const char *pathname u32 mask      
255 sys_inotify_rm_watch int fd __s32 wd        
256 sys_migrate_pages pid_t pid unsigned long maxnode const unsigned long *old_nodes const unsigned long *new_nodes    
257 sys_openat int dfd const char *filename int flags int mode    
258 sys_mkdirat int dfd const char *pathname int mode      
259 sys_mknodat int dfd const char *filename int mode unsigned dev    
260 sys_fchownat int dfd const char *filename uid_t user gid_t group int flag  
261 sys_futimesat int dfd const char *filename struct timeval *utimes      
262 sys_newfstatat int dfd const char *filename struct stat *statbuf int flag    
263 sys_unlinkat int dfd const char *pathname int flag      
264 sys_renameat int oldfd const char *oldname int newfd const char *newname    
265 sys_linkat int oldfd const char *oldname int newfd const char *newname int flags  
266 sys_symlinkat const char *oldname int newfd const char *newname      
267 sys_readlinkat int dfd const char *pathname char *buf int bufsiz    
268 sys_fchmodat int dfd const char *filename mode_t mode      
269 sys_faccessat int dfd const char *filename int mode      
270 sys_pselect6 int n fd_set *inp fd_set *outp fd_set *exp struct timespec *tsp void *sig
271 sys_ppoll struct pollfd *ufds unsigned int nfds struct timespec *tsp const sigset_t *sigmask size_t sigsetsize  
272 sys_unshare unsigned long unshare_flags          
273 sys_set_robust_list struct robust_list_head *head size_t len        
274 sys_get_robust_list int pid struct robust_list_head **head_ptr size_t *len_ptr      
275 sys_splice int fd_in loff_t *off_in int fd_out loff_t *off_out size_t len unsigned int flags
276 sys_tee int fdin int fdout size_t len unsigned int flags    
277 sys_sync_file_range long fd loff_t offset loff_t bytes long flags    
278 sys_vmsplice int fd const struct iovec *iov unsigned long nr_segs unsigned int flags    
279 sys_move_pages pid_t pid unsigned long nr_pages const void **pages const int *nodes int *status int flags
280 sys_utimensat int dfd const char *filename struct timespec *utimes int flags    
281 sys_epoll_pwait int epfd struct epoll_event *events int maxevents int timeout const sigset_t *sigmask size_t sigsetsize
282 sys_signalfd int ufd sigset_t *user_mask size_t sizemask      
283 sys_timerfd_create int clockid int flags        
284 sys_eventfd unsigned int count          
285 sys_fallocate long fd long mode loff_t offset loff_t len    
286 sys_timerfd_settime int ufd int flags const struct itimerspec *utmr struct itimerspec *otmr    
287 sys_timerfd_gettime int ufd struct itimerspec *otmr        
288 sys_accept4 int fd struct sockaddr *upeer_sockaddr int *upeer_addrlen int flags    
289 sys_signalfd4 int ufd sigset_t *user_mask size_t sizemask int flags    
290 sys_eventfd2 unsigned int count int flags        
291 sys_epoll_create1 int flags          
292 sys_dup3 unsigned int oldfd unsigned int newfd int flags      
293 sys_pipe2 int *filedes int flags        
294 sys_inotify_init1 int flags          
295 sys_preadv unsigned long fd const struct iovec *vec unsigned long vlen unsigned long pos_l unsigned long pos_h  
296 sys_pwritev unsigned long fd const struct iovec *vec unsigned long vlen unsigned long pos_l unsigned long pos_h  
297 sys_rt_tgsigqueueinfo pid_t tgid pid_t pid int sig siginfo_t *uinfo    
298 sys_perf_event_open struct perf_event_attr *attr_uptr pid_t pid int cpu int group_fd unsigned long flags  
299 sys_recvmmsg int fd struct msghdr *mmsg unsigned int vlen unsigned int flags struct timespec *timeout  
300 sys_fanotify_init unsigned int flags unsigned int event_f_flags        
301 sys_fanotify_mark long fanotify_fd long flags __u64 mask long dfd long pathname  
302 sys_prlimit64 pid_t pid unsigned int resource const struct rlimit64 *new_rlim struct rlimit64 *old_rlim    
303 sys_name_to_handle_at int dfd const char *name struct file_handle *handle int *mnt_id int flag  
304 sys_open_by_handle_at int dfd const char *name struct file_handle *handle int *mnt_id int flags  
305 sys_clock_adjtime clockid_t which_clock struct timex *tx        
306 sys_syncfs int fd          
307 sys_sendmmsg int fd struct mmsghdr *mmsg unsigned int vlen unsigned int flags    
308 sys_setns int fd int nstype        
309 sys_getcpu unsigned *cpup unsigned *nodep struct getcpu_cache *unused      
310 sys_process_vm_readv pid_t pid const struct iovec *lvec unsigned long liovcnt const struct iovec *rvec unsigned long riovcnt unsigned long flags
311 sys_process_vm_writev pid_t pid const struct iovec *lvec unsigned long liovcnt const struct iovcc *rvec unsigned long riovcnt unsigned long flags
312 sys_kcmp pid_t pid1 pid_t pid2 int type unsigned long idx1 unsigned long idx2  
313 sys_finit_module int fd const char __user *uargs int flags      
314 sys_sched_setattr pid_t pid struct sched_attr __user *attr unsigned int flags      
315 sys_sched_getattr pid_t pid struct sched_attr __user *attr unsigned int size unsigned int flags    
316 sys_renameat2 int olddfd const char __user *oldname int newdfd, const char __user *newname unsigned int flags    
317 sys_seccomp unsigned int op unsigned int flags const char __user *uargs      
318 sys_getrandom char __user *buf size_t count unsigned int flags      
319 sys_memfd_create const char __user *uname_ptr unsigned int flags        
320 sys_kexec_file_load int kernel_fd int initrd_fd unsigned long cmdline_len const char __user *cmdline_ptr unsigned long flags  
321 sys_bpf int cmd union bpf_attr *attr unsigned int size      

参考:
http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64
https://filippo.io/linux-syscall-table/

Author: cig01

Created: <2013-01-12 Sat>

Last updated: <2018-01-26 Fri>

Creator: Emacs 27.1 (Org mode 9.4)