C

Table of Contents

1 C简介

C was originally developed by Dennis Ritchie between 1969 and 1973 at AT&T Bell Labs, and used to re-implement the Unix operating system. The developers were considering to rewrite the system using the B language, Thompson's simplified version of BCPL. However B's inability to take advantage of some of the PDP-11's features, notably byte addressability, led to C.

参考:
https://en.wikipedia.org/wiki/The_C_Programming_Language
The C Programming Language, 2nd: http://www.ime.usp.br/~pf/Kernighan-Ritchie/C-Programming-Ebook.pdf
C Programming - A Modern Approach, 2nd Edition: http://www.amazon.com/C-Programming-Modern-Approach-2nd/dp/0393979504/

1.1 C语言标准

1990年,国际标准化组织ISO(International Organization for Standards)接受了89 ANSI C为ISO C的标准(ISO 9899-1990)。 这个版本被称为ANSI C。
1999年,ISO又对C语言标准进行修订,在基本保留原来C语言特征的基础上,针对应该的需要,增加了一些功能,命名为ISO/IEC9899:1999。
2011年12月8日,ISO正式公布C语言新的标准:ISO/IEC 9899:2011,即C11。

The latest publically available version of the C99 standard is the combined C99 + TC1 + TC2 + TC3, WG14 N1256, dated 2007-09-07.
The latest publically available version of the C11 standard is the document WG14 N1570, dated 2011-04-12.

Table 1: C标准头文件变迁
ANSI C, 1990 (15个) ANSCI C Amendemnt 1, 1995 (18个) C99, 1999 (24个) C11, 2011 (29个)
<assert.h> <assert.h> <assert.h> <assert.h>
<ctype.h> <ctype.h> <ctype.h> <ctype.h>
<errno.h> <errno.h> <errno.h> <errno.h>
<float.h> <float.h> <float.h> <float.h>
<limits.h> <limits.h> <limits.h> <limits.h>
<locale.h> <locale.h> <locale.h> <locale.h>
<math.h> <math.h> <math.h> <math.h>
<setjmp.h> <setjmp.h> <setjmp.h> <setjmp.h>
<signal.h> <signal.h> <signal.h> <signal.h>
<stdarg.h> <stdarg.h> <stdarg.h> <stdarg.h>
<stddef.h> <stddef.h> <stddef.h> <stddef.h>
<stdio.h> <stdio.h> <stdio.h> <stdio.h>
<stdlib.h> <stdlib.h> <stdlib.h> <stdlib.h>
<string.h> <string.h> <string.h> <string.h>
<time.h> <time.h> <time.h> <time.h>
  <iso646.h> <iso646.h> <iso646.h>
  <wchar.h> <wchar.h> <wchar.h>
  <wctype.h> <wctype.h> <wctype.h>
    <complex.h> <complex.h>
    <inttypes.h> <inttypes.h>
    <stdbool.h> <stdbool.h>
    <stddef.h> <stddef.h>
    <stdint.h> <stdint.h>
    <tgmath.h> <tgmath.h>
      <stdalign.h>
      <stdatomic.h>
      <stdnoreturn.h>
      <threads.h>
      <uchar.h>

参考:
C committee website: http://www.open-std.org/jtc1/sc22/wg14/
ISO/IEC 9899 - Programming languages - C: http://www.open-std.org/jtc1/sc22/wg14/www/standards

2 C数据类型

2.1 数字数据类型的字节数

C语言中数字类型的字节数(不同编译器可能有不同)。

Table 2: C语言中数字类型的字节数
C数字类型声明 32位机器 64位机器
[signed/unsigned] char 1 1
[signed/unsigned] short [int] 2 2
[signed/unsigned] int 4 4
[signed/unsigned] long [int] 4 8
[signed/unsigned] long long [int] 8 8
char * 4 8
float 4 4
double 8 8
long double - -

参考:深入理解计算机系统(原书第2版) 2.1.3节

2.2 确定大小的整数 (int32_t, uint16_t)

为了更好的移植性,C99中引入了新的头文件stdint.h,这个文件中定义一些确定大小的整数。
它们的声明形式为intN_t和uintN_t,指定的是N位有符号和无符号整数。通常N为8、16、32和64。

2.3 浮点数

C语言中浮点数采用 IEEE 754 标准。

单精度,对应C语言的float
双精度,对应C语言的double
扩展精度,对应C语言的long double

IEEE 754在线转换网站:
http://www.h-schmidt.net/FloatConverter/
http://babbage.cs.qc.cuny.edu/IEEE-754/

2.4 有符号整数的二进制表示

有符号整数用补码(Two's complement)表示。
正整数的补码和其原码(即该数的二进制表示)相同, 负整数的补码是:将该数的绝对值的二进制形式,按位取反再加1。

显然,对于有符号整数,如果最左边的一位是0则表示正数,是1则表示负数。

#include<stdio.h>

int main()
{
    printf(" 1=0x%x\n", 1);    /*  1=0x00000001 */
    printf("-1=0x%x\n", -1);   /* -1=0xffffffff */
    printf("-2=0x%x\n", -2);   /* -2=0xfffffffe */
    return 0;
}

2.5 Formatted input/output

参考:ISO&IEC-9899-1999(E) 7.19.6 Formatted input/output functions

2.5.1 Formatted output

The printf functions provide formatted output conversion.

int fprintf(FILE *stream, const char *foramt, ...)

The format string contains two types of objects: ordinary characters, which are copied to the output stream, and conversion specification.
Each conversion specification begins with the character % and ends with a conversion specifier character. Between the % and the conversion specifier there may be, in order:

  • Zero or more flags (in any order).
  • An optional minimum field width.
  • An optional period, which separates the field width from the precison.
  • An optional precision.
  • An optional length modifier.
Table 3: PRINTF conversion specifier
Conversion sepcifier Argument Type; Converted to
d, i int; 带符号整数,十进制
o unsigned int; 无符号整数,八进制
x, X unsigned int; 无符号整数,十六进制
u unsigned int; 无符号整数,十进制
f double; decimal notation of the form [-]mmm.ddd, where the number of d's is specified by the precision. The default precision is 6; a precison of 0 suppresses the decimal point.
e, E double; decimal notation of the form [-]m.dddddd e±xx or [-]m.dddddd E±xx, where the number of d's is specified by the precision. The default precision is 6; a precison of 0 suppresses the decimal point.
g, G double; %e or %E is used if the exponen is less that -4 or greater than or equal to the precision; otherwise %f is used. Trailing zeros and a trailing decimal point are not printed.
a, A double; decimal notation of the form [−]0xh.hhhhp±d
c int; the int argument is converted to an unsigned char.
s char *; characters from the string are printed until a '\0' is reached or until the number of characters indicated by the precision have been printed.
p void *; print as a pointer.
n int *; the argument shall be a pointer to signed integer into which is written the number of characters written to the output stream so far by this call to printf functions. No argument is converted, but one is consumed.
% no argument is converted; print a %.

举例:printf中%n用法

#include <stdio.h>
int main()
{
    int count1;
    int count2;

    printf("ABCDE%nFGHI%n\n", &count1, &count2);
    printf("First count is %d, second count is %d.\n", count1, count2);
    // 上行会输出 First count is 5, second count is 9.
    return 0;
}
2.5.1.1 flags
Table 4: flags (Formatted output)
flag characters meanings
- The result of the conversion is left-justified within the field. (It is right-justified if this flag is not specified.)
+ The result of a signed conversion always begins with a plus or minus sign.
space If the first character of a signed conversion is not a sign, or if a signed conversion results in no characters, a space is prefixed to the result. If the space and + flags both appear, the space flag is ignored.
# The result is converted to an alternative form. For o conversion, it increases the precision, if and only if necessary, to force the first digit of the result to be a zero (if the value and precision are both 0, a single 0 is printed). For x (or X) conversion, a nonzero result has 0x (or 0X) prefixed to it. For a, A, e, E, f, F, g, and G conversions, the result of converting a floating-point number always contains a decimal-point character, even if no digits follow it. (Normally, a decimal-point character appears in the result of these conversions only if a digit follows it.) For g and G conversions, trailing zeros are not removed from the result. For other conversions, the behavior is undefined.
O For d, i, o, u, x, X, a, A, e, E, f, F, g, and G conversions, leading zeros (following any indication of sign or base) are used to pad to the field width rather than performing space padding, except when converting an infinity or NaN. If the 0and-flags both appear, the0flag is ignored. Ford,i,o,u,x, andX conversions, if a precision is specified, the 0 flag is ignored. For other conversions, the behavior is undefined.
2.5.1.2 length modifier
Table 5: length modifier (Formatted output)
length modifier characters meanings
hh Indicates the argument is a signed char or unsigned char
h Indicates the argument is a short int or unsigned short
l Indicates the argument is a long int or unsigned long int
ll Indicates the argument is a long long int or unsigned long long int
j Indicates the argument is a intmax_t or uintmax_t (In <inttypes.h>)
z Indicates the argument is a size_t
t Indicates the argument is a ptrdiff_t (In <stddef.h>)
L Indicates the argument is a long double

实例:以十进制输出size_t类型,C99中可用%zu,其中z是length modifier,而u是conversion specifier character。

2.5.2 Formatted input

The scanf functions deal with formatted input conversion.

int fscanf(FILE *stream, const char *format, ...)

fscanf reads from stream under control of format, and assigns converted values through subsequent arguments, each of which must be a pointer. It returns when format is exhausted. fscanf returns EOF if end of file or an error occurs before any conversion; otherwise it returns the number of input items converted and assigned.

注意:scanf系列函数中的format和printf中有些类似,但也有很多不同。如scanf不能做精度控制等等。

参考:ISO&IEC-9899-1999(E) 7.19.6.2 The fscanf function

2.5.2.1 assignment suppression (*)

如果想忽略读入的部分输入,可以用星号*标记,如 %*c 可忽略一个字符。

实例:忽略1个字符

#include<stdio.h>
int main(void)
{
    int x, y;
    scanf("%d%*c%d",&x,&y);               /* 读入123/456时,123会赋值给x,456则赋值给y */
    print("x is %d, y is %d\n", x, y);
    return 0;
}
2.5.2.2 scanset ([…], [^…])

扫描集(scanset)定义一个字符集合,可由scanf()读入其中允许的字符并赋给对应字符数组。
扫描集合由一对方括号中的一串字符定义,左方括号前必须有百分号。
另外,如果方括号中第1个字符为^,则表示“取反”的意思。

实例:scanf()函数如何接受有空格的字符串?

#include<stdio.h>
int main(void)
{
    char str[100];
    scanf("%[^\n]",str);  /* scanf("%s",string);不能接收字符串中的空格 */
    printf("%s\n",str);
    return 0;
}

2.6 typedef

typedef可以为现有类型创建一个新的名字。typedef并不创建新的类型。

typedef existing_type new_type_name;

如:

typedef char C;
typedef unsigned int WORD;

2.6.1 typedef和数组

typedef为数组创建别名,如:

typedef char Line[81];
Line line, secondline;

相当于:

char line[81];
char secondline[81];

2.6.2 typedef和函数指针

typedef为函数指针创建别名,如:

typedef void (*PrintHelloHandle)(int);
PrintHelloHandle pFunc;

相当于:

void (*pFunc)(int);

3 C运算符

3.1 C语言运算符的优先级和结合性

C语言运算符位于15个优先级中,同一优先级的运算符,其运算先后顺序由结合性决定。

Following table lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.

+-----------------+--------------------------------------------+---------------+
|    Operator     |          Description                       | Associativity |
+-----------------+--------------------------------------------+---------------+
| ()              | Function call                              | left-to-right |
| []              | Array subscript                            |               |
| .               | Member of structure via object name        |               |
| ->              | Member of structure via pointer            |               |
+-----------------+--------------------------------------------+---------------+
| ++ --           | Increment/decrement                        | right-to-left |
| + -             | Unary plus/minus                           |               |
| ! ~             | Logical negation/bitwise complement        |               |
| (type)          | Case                                       |               |
| *               | Dereference                                |               |
| &               | Address (of operand)                       |               |
| sizeof          | Determine size in bytes                    |               |
+-----------------+--------------------------------------------+---------------+
| *  /  %         | Multiplication/division/modulus            | left-to-right |
+-----------------+--------------------------------------------+---------------+
| + -             | Addition/subtraction                       | left-to-right |
+-----------------+--------------------------------------------+---------------+
| <<  >>          | Bitwise shift left, Bitwise shift right    | left-to-right |
+-----------------+--------------------------------------------+---------------+
| <  <=           | Less than/less than or equal to            | left-to-right |
| >  >=           | Greater than/greater than or equal to      |               |
+-----------------+--------------------------------------------+---------------+
| ==  !=          | Equal to/not equal to                      | left-to-right |
+-----------------+--------------------------------------------+---------------+
| &               | Bitwise AND                                | left-to-right |
+-----------------+--------------------------------------------+---------------+
| ^               | Bitwise exclusive OR                       | left-to-right |
+-----------------+--------------------------------------------+---------------+
| |               | Bitwise inclusive OR                       | left-to-right |
+-----------------+--------------------------------------------+---------------+
| &&              | Logical AND                                | left-to-right |
+-----------------+--------------------------------------------+---------------+
| ||              | Logical OR                                 | left-to-right |
+-----------------+--------------------------------------------+---------------+
| ? :             | Ternary conditional                        | right-to-left |
+-----------------+--------------------------------------------+---------------+
| =               | Assignment                                 | right-to-left |
| += -=           | Addition/subtraction assignment            |               |
| *= /=           | Multiplication/division assignment         |               |
| %= &=           | Modulus/bitwise AND assignment             |               |
| ^= |=           | Bitwise exclusive/inclusive OR assignemnt  |               |
| <<= >>=         | Bitwise shift left/right assignment        |               |
+-----------------+--------------------------------------------+---------------+
| ,               | Comma (separate expressions)               | left-to-right |
+-----------------+--------------------------------------------+---------------+

从上表中,可得知运算符的优先级有下面规律:

初等运算符 () [] . ->
    ↓
单目运算符
    ↓
算术运算符(先乘除和模运算,后加减)
    ↓
关系运算符
    ↓
逻辑运算符(不包括单目运算符!)
    ↓
条件运算符
    ↓
赋值运算符
    ↓
逗号运算符

注:位运算符的优先级比较分散,按位取反(~)是单目运算符,左移右移在关系运算符之前,而&,^,|在关系运算符之后。

参考:
http://www.difranco.net/compsci/C_Operator_Precedence_Table.htm
The C Programming Language, 2nd, 2.12 Precedence and Order of Evaluation

3.2 C语言位操作符 (&, |, ^, ~, >>, <<)

“异或”(XOR)位操作符 ^ 规则: 按位操作,两个操作数不相同时为1,相同时为0。

Table 6: The truth tables for &, |, and ^.
p值 q值 p & q 结果 p | q 结果 p ^ q 结果
0 0 0 0 0
0 1 0 1 1
1 1 1 1 0
1 0 0 1 1

3.2.1 右移

对于右移运算,有两种形式的右移。如右移k位 x >> k ,“逻辑右移”在左端补k个0,而“算术右移”在左端补k个最高有效位的值。

C语言标准没有明确定义使用哪种右移。对于无符号数,右移必须是逻辑的。 对于有符号数,几乎所有的编译器都使用算术右移(左端补符号位)。

4 C结构体

Pointers to structures are so frequently used that an alternative notation(->) is provided as a shorthand.
If p is a pointer to a structure, then p->member-of-structure refers to the particular member.

Both . and -> associate from left to right, so if we have

struct point {
    int x;
    int y;
};

struct rect {
    struct point pt1;
    struct point pt2;
};

struct rect r, *rp = &r;

then these four expressions are equivalent:

    r.pt1.x
    rp->pt1.x
    (r.pt1).x
    (rp->pt1).x

参考:The C Programming Language, 2nd, 6.2 Section

4.1 结构体的内存对齐

结构体的内存对齐有两点要注意:
第一,每个成员变量的首地址,必须是它的类型的对齐值的整数倍,如果不是,则它与前一个成员变量之间要填充一些字节来满足要求;
第二,整个结构体的大小,必须是该结构体中所有成员的类型中对齐值最大者的整数倍,如果不是,则在最后一个成员后面填充一些字节以满足要求。

4.1.1 各类型对齐值

各类型的对齐值如下:
The following typical alignments are valid for compilers from Microsoft (Visual C++), Borland/CodeGear (C++Builder), Digital Mars (DMC), and GNU (GCC) when compiling for 32-bit x86:

  • A char (one byte) will be 1-byte aligned.
  • A short (two bytes) will be 2-byte aligned.
  • An int (four bytes) will be 4-byte aligned.
  • A long (four bytes) will be 4-byte aligned.
  • A float (four bytes) will be 4-byte aligned.
  • A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).
  • A long long (eight bytes) will be 8-byte aligned.
  • A long double (ten bytes with C++Builder and DMC, eight bytes with Visual C++, twelve bytes with GCC) will be 8-byte aligned with C++Builder, 2-byte aligned with DMC, 8-byte aligned with Visual C++, and 4-byte aligned with GCC.
  • Any pointer (four bytes) will be 4-byte aligned. (e.g.: char*, int*)

The only notable differences in alignment for an LP64 64-bit system when compared to a 32-bit system are:

  • A long (eight bytes) will be 8-byte aligned.
  • A double (eight bytes) will be 8-byte aligned.
  • A long double (eight bytes with Visual C++, sixteen bytes with GCC) will be 8-byte aligned with Visual C++ and 16-byte aligned with GCC.
  • Any pointer (eight bytes) will be 8-byte aligned.

参考:http://en.wikipedia.org/wiki/Data_structure_alignment

4.1.2 结构体对齐实例

下面类型的变量会占多少个字节呢?

struct MixedData
{
    char Data1;
    short Data2;
    int Data3;
    char Data4;
};

答案是12字节。分析如下:

struct MixedData  /* After compilation in 32-bit(64-bit) x86 machine */
{
    char Data1;   /* 1 byte */
    char Padding1[1]; /* 1 byte for the following 'short' to be aligned on
                         a 2 byte boundary assuming that the address where
                         structure begins is an even number */
    short Data2;  /* 2 bytes */
    int Data3;    /* 4 bytes - largest structure member */
    char Data4;   /* 1 byte */
    char Padding2[3]; /* 3 bytes to make total size of the structure 12 bytes */
};

例子摘自:https://en.wikipedia.org/wiki/Data_structure_alignment

5 C指针和数组

指针和数组并不是相同的。为了说明这个概念,考虑下面这两个声明:

int a[5];
int *b;

a和b都可以进行间接访问和下标引用操作。但它们存在很大区别。
声明一个数组时,如a,编译器将根据声明所指定的元素数量为数组保留内存空间,然后再创建数组名,它的值是一个常量,指向这段空间的起始位置。
声明一个指针变量时,如b,编译器只为指针本身保留内存空间,它并不为任何整形值分配内存空间。而且,指针变量并未被初始化为指向任何现有的内存空间,如果它是一个自动变量,它甚至根本不会被初始化。
把这两个声明可形象地表示如下:

a
+----+----+----+----+----+
|    |    |    |    |    |
+----+----+----+----+----+

b
+----+
|    |
+----+

因此,上述声明之后,表达式*a是完全合法的(就是数组a首元素的值),而表达式*b将访问内存中某个不确定的位置。另一方面,表达式b++可以通过编译,而a++却不行,因为a的值是个常量。

参考:C和指针 8.1.5 数组和指针

5.1 Pointers

A pointer is a variable that contains the address of a variable.

The unary operator & gives the address of an object.
The unary operator *, when applied to a pointer, it accesses the object the pointer points to.

5.2 Array Decay to Pointer

"Decay" refers to the implicit conversion of an expression from an array type to a pointer type.

int a[] = { 1, 3, 5, 7, 9 };
int *p = a;

You lose the ability of the sizeof operator to count elements in the array:

printf("%zu\n", sizeof(a));   // 输出: 20, 可算出数组中有5个元素
printf("%zu\n", sizeof(p));   // 64位机器中输出: 8, 无法推算出对应数组的元素个数

This lost ability is referred to as "decay".

参考:http://stackoverflow.com/questions/1461432/what-is-array-decaying

5.3 数组名为数组首元素的地址

Since the name of an array is a synonym for the location of the initial element, the assignment pa=&a[0] can also be written as pa = a.

int a[10];             /* defines an array of size 10 */
int *pa;
pa = &a[0];            /* sets pa to point to a[0] */
pa = a;                /* same as above */

5.4 Convert a[i] to *(a+i)

C converts a[i] to *(a+i) immediately, the two forms are always equivalent.
Thus, &a[i] and a+i are also identical.

5.5 二维数组

int a[2][4] = { {0,1,2,3}, {4,5,6,7} };

上面定义了一个二维数组a,可以通过 a[0][0]a[0][1] 等访问各个元素。

C编译器总是做这样的转换 a[i][j] = *(a[i] + j) = *(*(a+i) + j)

5.5.1 二维数组和指针实例——为什么有warning

#include<stdio.h>

int main()
{
    int a[2][4] = { {0,1,2,3}, {4,5,6,7} };
    int *p=a;                 /* Wrong!!! Please use &a[0][0] or a[0] */
    printf("%d\n", *(p+1));
    return 0;
}

编译上面程序(1.c)时,为什么有类似下面的warning?

cc     1.c   -o 1
1.c: In function 'main':
1.c:6:12: warning: initialization from incompatible pointer type
     int *p=a;
            ^

a是二维数组,它有两个元素,每个元素都是一个一维数组。
前面提到,数组名为数组首元素的地址,也就是说a是一维数组的地址,所以把a赋值给一个"int *"类型的指针是不合适的(会有不兼容的指针类型转换)!要使warning消失可以这样 int *p=&a[0][0]; ,或可以省写为 int *p=a[0] (because &a[i] and a+i are always identical)

总结: a[0] points to the first element of row 0, and a[1] points to the first element of row 1.

5.5.2 二维数组和指针实例——如何定义“行指针”

要定义行指针,直接用二维数组名(数组首元素地址)即可。

#include<stdio.h>

int main()
{
    int a[2][4] = { {0,1,2,3}, {4,5,6,7} };
    int (*p)[4]=a;                  /* or use &a[0], but don't need to do */
    printf("%d\n", *(p+1));         /* 输出数组{4,5,6,7}的地址 */
    printf("%d\n", *(*(p+1) + 1));  /* 输出数字5 */
    return 0;
}

a是二维数组,它有两个元素,每个元素都是一个一维数组。
前面提到,数组名为数组首元素的地址,也就是说a是一维数组的地址,所以可以这样定义p指定二维数组的首行 int (*p)[4]=a; ,或者p也可以这样定义 int (*p)[4]=&a[0]; (because &a[i] and a+i are always identical)

参考:
http://stackoverflow.com/questions/24578628/pointer-to-an-entire-row-in-a-2-d-array

5.5.3 二维数组能否转换为“指针的指针”

我们无法将二维数组能否转换为“指针的指针”,2D array和pointer-to-pointer是不兼容的类型。

下面代码是错误的,在编译时会有Warning (Incompatible pointer types).

  int a[2][3] = {{1,2,3}, {4,5,6}};
  int **p=a;                        /* Wrong! Incompatible pointer types. */

如果我们确实需要将二维数组“转换”为指针的指针,则可以像下面这样做:

  int a[2][3] = {{1,2,3}, {4,5,6}};
  int *a_rows[2] = {a[0], a[1]};      /* 通过a_rows作为“中介” */
  int **p=a_rows;

参考:
http://stackoverflow.com/questions/8203700/conversion-of-2d-array-to-pointer-to-pointer
http://stackoverflow.com/questions/1052818/create-a-pointer-to-two-dimensional-array

5.5.4 函数参数为二维数组

If a two-dimensional array is to be passed to a function, the parameter declaration in the function must include the number of columns; the number of rows is irrelevant.

For instance, consider this two-dimensional array:

char daytab[2][13] = {
    {0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31},
    {0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}
};

If the array daytab is to be passed to a function f, the declaration of f would be:

f(int daytab[2][13]) { ... }

It could also be

f(int daytab[][13]) { ... }

since the number of rows is irrelevant, or it could be

f(int (*daytab)[13]) { ... }

which says that the parameter is a pointer to an array of 13 integers. The parentheses are necessary since brackets [] have higher precedence than *.

参考:
"The C Programming Language, 2nd" 5.7 Multi-dimensional Arrays

5.6 字符数组初始化

初值个数小于数组长度,其余的所有元素会被自动设置为空字符'\0'。

char c[5]={'a', 'b'};
char c[5]={'a', 'b', '\0', '\0', '\0'};

上面两个定义相同。

char c[]={"I am happy"};
char c[]="I am happy";
char c[]={'I',' ','a','m',' ','h','a','p','p','y','\0'};

上面三种方式是等价的(长度为11)。注意,它们和下面数组(长度为10)是不同的。

char c[]={'I',' ','a','m',' ','h','a','p','p','y'};

5.7 Pointers to Pointers

什么时候会用到指针的指针呢?

  • The name of an array usually yields the address of its first element. So if the array contains elements of type t, a reference to the array has type t *. Now consider an array of arrays of type t: naturally a reference to this 2D array will have type (t *)* = t **, and is hence a pointer to a pointer.
  • Even though an array of strings sounds one-dimensional, it is in fact two-dimensional, since strings are character arrays. Hence: char **.
  • A function f will need to accept an argument of type t ** if it is to alter a variable of type t *.
  • Many other reasons that are too numerous to list here.

参考:http://stackoverflow.com/questions/897366/how-do-pointer-to-pointers-work-in-c

5.7.1 给“指针的指针”分配内存

We will take an example for allocating memory to a pointer to pointer to float values. Let the number of rows be '4' and the number of columns '3'.

float **float_values;

float_values = (float**) malloc(4 *sizeof(float*));         //allocate memory for rows

for(int i=0; i<4; i++) {
   *(float_values + i) = (float*) malloc(3 *sizeof(float)); //for each row allocate memory for columns
}

c_ptr_to_ptr.jpg

Figure 1: Allocating memory to a 'Pointer to Pointer' variable

参考:http://www.codeproject.com/Articles/12449/Allocating-memory-to-a-Pointer-to-Pointer-variable

6 C声明

C99的声明语法:

declaration:
    declaration-specifiers [init-declarator-list];

declaration-specifiers:
    storage-class-specifier [declaration-specifiers]
    type-specifier [declaration-specifiers]
    type-qualifier [declaration-specifiers]

......

参考:ISO&IEC-9899-1999(E), 6.7 Declarations

6.1 Storage-class specifiers (extern, static, …)

storage-class-specifier语法:

storage-class-specifier:
    typedef
    extern
    static
    auto
    register

注:一个变量最多用一个storage-class-specifier
注:typedef和存储类型没有任何关系,把它们放在一起仅是为了简化语法规则。
The typedef specifier is called a 'storage-class specifier' for syntactic convenience only.

参考:
http://stackoverflow.com/questions/8674236/is-typedef-a-storage-class-specifier
ISO&IEC-9899-1999(E), 6.7.1 Storage-class specifiers

6.1.1 Storage-class specifier: extern

extern changes the linkage. With this keyword, the function/variable is assumed to be available somewhere else and the resolving is deferred to the linker.

extern表示函数或变量的定义在其它地方。

6.1.1.1 extern修饰函数

函数声明默认是extern的。

int foo(int arg1, char arg2);
extern int foo(int arg1, char arg2);    /* extern keyword can be omitted */
6.1.1.2 extern修饰变量

If the program is in several source files, and a variable is defined in file1 and used in file2 and file3, then extern declarations are needed in file2 and file3 to connect the occurrences of the variable.

extern实例1:

/* file1.c */
int a=1;

/* file2.c */
#include<stdio.h>
extern int a;             /* 用extern说明a定义在其它位置 */

int main()
{
    printf("a=%d\n", a);   /* 输出a=1 */
    return 0;
}

上例是extern的典型用法,用于说明全局变量在其它文件中定义。
注意:如果file2.c中去掉extern关键字,则相当于在file2.c中定义了另外一个未初始化全局变量a,链接时它属于一个 弱符号 ,当把file2.o和file1.o链接到一起时,会选择file1.o中的已初始化的全局变量,因为它是 强符号 ,所以程序的输出结果会一样,还是a=1

extern实例2:

#include<stdio.h>

int a=1;
int b=1;

int main()
{
    int a;
    extern int b;       /* 用extern说明b定义在其它位置 */

    printf("a=%d\n", a);   /* 输出 a=0 */
    printf("b=%d\n", b);   /* 输出 b=1 */

    return 0;
}

说明:由于b定义在同一个文件中,上面例子中的extern int b;这行是多余的,实践中往往会省略。

参考:The C Programming Language, 2nd, 1.10 External Variables and Scope

6.1.2 Storage-class specifier: static

The static declaration, applied to an external variable or function, limits the scope of that object to the rest of the source file being compiled.

6.1.2.1 static修饰局部变量

The static declaration can also be applied to internal variables. Internal static variables are local to a particular function just as automatic variables are, but unlike automatics, they remain in existence rather than coming and going each time the function is activated. This means that internal static variables provide private, permanent storage within a single function.

6.1.3 Storage-class specifier: auto

auto is the default storage class for local variables. auto can only be used within functions.

由于函数内的局部变量默认就是auto,所有基本上不用显式地使用auto关键字。

6.1.4 Storage-class specifier: register

A register declaration advises the compiler that the variable in question will be heavily used.
The idea is that register variables are to be placed in machine registers, which may result in smaller and faster programs. But compilers are free to ignore the advice.

However, if an object is declared register, the unary & operator may not be applied to it, explicitly or implicitly. The rule that it is illegal to calculate the address of an object declared register.

6.2 Type specifiers (void, char, …)

type-specifier语法:

type-specifier:
    void
    char
    short
    int
    long
    float
    double
    signed
    unsigned
    _Bool
    _Complex
    _Imaginary
    struct-or-union-specifier
    enum-specifier
    typedef-name

Type specifiers可以指定多个,如unsigned int等。

参考:ISO&IEC-9899-1999(E), 6.7.2 Type specifiers

6.3 Type qualifiers (const, restrict, volatile)

type-qualifier语法:

type-qualifier:
    const
    restrict
    volatile
Table 7: C中的type qualifier
type qualifier description
const 只读对象
restrict C99中新增,它只能用于指针。
volatile 可变对象

注:标准中没有限定一个变量最多用一个type qualifier,下面是一个同时使用const和volatile的例子:

extern const volatile int real_time_clock
/* real_time_clock may be modifiable by hardware, but cannot
   be assigned to, incremented, or decremented. */

参考:ISO&IEC-9899-1999(E), 6.7.3 Type qualifiers

6.3.1 Type qualifier: const

The qualifier const can be applied to the declaration of any variable to specify that its value will not be changed.

实例:const和指针

const int *A;        //const修饰指向的对象,A可变,A指向的对象不可变
int const *A;        //const修饰指向的对象,A可变,A指向的对象不可变(同上)
int *const A;        //const修饰指针A, A不可变,A指向的对象可变
const int *const A;  //指针A和A指向的对象都不可变

说明:int *const A这种没有初始化的声明没有用处(因为A不可以变),应该在定义时初始化,如int *const A=&a

6.3.2 Type qualifier: volatile

volatile 的作用是防止编译器对代码进行优化而改变了程序原有意图。
比如如下程序:

XBYTE[2]=0x55;
XBYTE[2]=0x56;
XBYTE[2]=0x57;
XBYTE[2]=0x58;

对外部硬件而言,上述四条语句分别表示不同的操作,会产生四种不同的动作,但是编译器可能会对上述四条语句进行优化,认为只有最后一条有效,忽略前三条语句(只产生一条代码)。如果使用volatile,则编译器会逐一的进行编译并产生相应的机器代码(产生四条代码)。

6.3.2.1 volatile实例
/*  这个程序没有可移植性,不同的编译器结果不一样! */
#include <stdio.h>
int main()
{
    volatile int i = 10;    /*   */
    int a,b;

    a = i;
    printf("i=%d\n", a);
    /* 下面汇编语句的作用是改变内存中i的值为20,但是C编译器不知道 */
    asm ("movl $20,   8(%rbp)");

    b = i;
    printf("i=%d\n", b);

    return 0;
}

由于变量i用volatile修饰,会输出:
i=10
i=20

如果去掉修饰i的volatile关键字,则可能会输出(C编译器不知道i被内嵌的汇编代码修改了,优化了变量b为i初值):
i=10
i=10

参考:
http://blog.chinaunix.net/uid-22906954-id-4598507.html
http://baike.baidu.com/view/608706.htm

6.3.3 Type qualifier: restrict

restrict 是C99标准引入的,它只可以用于限定和约束指针,表明指针是访问一个数据对象的唯一且初始的方式。即它告诉编译器,所有修改该指针所指向内存中内容的操作都必须通过该指针来修改,而不能通过其它变量或指针来修改;这样做的好处是能帮助编译器进行更好的优化代码,生成更有效率的汇编代码。

6.4 如何分析C中复杂的声明

C语言中的声明可以写得非常复杂,用下面的优先级规则可以帮助理解。

C语言声明的优先级规则

A 声明从它的名字开始读取,然后按照优先级顺序依次读取。
B 优先级从高到低依次是:
  B.1 声明中被括号括起来的那部分。
  B.2 后缀操作符:括号()表示这是一个函数,而方括号[]表示这是一个数组。
  B.3 前缀操作符:星号*表示“指向...的指针”。
C 如果const和(或)volatile关键字的后面紧跟类型说明符(如int,long等),那么它作用于类型说明符。在其他情况下,const和(或)volatile关键字作用于它左边紧邻的指针星号。

参考:
C专家编程 3.3节
C声明在线分析 http://cdecl.org/

6.4.1 C声明分析实例

分析如下声明:
char * const *(*next)();

分析过程:

Table 8: 声明char * const *(*next)();的分析过程
适用规则 解释
A 首先,看变量名"next",并注意到它直接被括号所括住
B.1 所以先把括号里的东西作为一个整体,得出"next是一个指向…的指针"
B 然后考虑括号外面的东西,在星号前缀和括号后缀之间作出选择
B.2 B.2规则告诉我们优先级较高的是右边的函数括号,所以得出"next是一个函数指针,指向一个返回…的函数"
B.3 然后,处理前缀"*",得出指针所指的内容
C 最后,把"char * const"解释为指向字符的常量指针

分析结果:
概括上面分析过程,这个声明表示“next是一个指针,它指向一个函数,该函数返回另一个指针,该指针指向一个类型为char的常量指针”。

6.5 定义和声明的区别

``Definition'' refers to the place where the variable is created or assigned storage; ``declaration'' refers to places where the nature of the variable is stated but no storage is allocated.

7 C语句

7.1 if语句

形式一:

if (expression)
  statement1
else
  statement2

where the else part is optional.

形式二:

if (expression)
  statement
else if (expression)
  statement
else if (expression)
  statement
else if (expression)
  statement
else
  statement

where the else part is optional.

7.2 switch语句

The switch statement is a multi-way decision that tests whether an expression matches one of a number of constant integer values, and branches accordingly.

switch (expression) {
  case const-expr: statements
  case const-expr: statements
  default: statements
}

Each case is labeled by one or more integer-valued constants or constant expressions. If a case matches the expression value, execution starts at that case. All case expressions must be different. The case labeled default is executed if none of the other cases are satisfied. A default is optional; if it isn't there and if none of the cases match, no action at all takes place. Cases and the default clause can occur in any order.

The break statement causes an immediate exit from the switch.

7.3 for, while和do-while语句

C语言中有三种循环语句,分别如图 2,图 3,及图 4 所示。

c_for_loop.jpg

Figure 2: for loop in C

c_while_loop.jpg

Figure 3: while loop in C

c_do_while_loop.jpg

Figure 4: do while loop in C

7.4 break和continue语句

Table 9: C中break和continue
Control Statement 描述
break Terminates the loop or switch statement and transfers execution to the statement immediately following the loop or switch.
continue Causes the loop to skip the remainder of its body and immediately retest its condition prior to reiterating.

说明:switch语句中continue无意义,若有continue则属于外层的循环语句。

7.5 goto和label

goto 和label的使用是local to function.

8 C函数

8.1 Arguments - Call by Value

In C, all function arguments are passed by value. This means that the called function is given the values of its arguments in temporary variables rather than the originals. This leads to some different properties than are seen with call by reference languages like Fortran or with var parameters in Pascal, in which the called routine has access to the original argument, not a local copy.

When the name of an array is used as an argument, the value passed to the function is the address of the beginning of the array - there is no copying of array elements.

8.1.1 参数的执行顺序

C99标准中没有指定函数参数的执行顺序。

From C99

6.5.2.2 Function calls
...
10 The order of evaluation of the function designator, the actual arguments, and subexpressions within the actual arguments is unspecified, but there is a sequence point before the actual call.

上面条款明确说明了:函数参数执行顺序是unspecified的,但能保证参数执行在真正调用函数前(因为真正调用函数前有sequence point)。

8.1.2 函数参数为void

在C语言中,函数参数为空和指定为void的含义不同。
注:在C++中它们含义相同,都是不接受参数。

int func1();      // In C, declare function taking unspecified parameters
int func2(void);  // In C, declare function taking zero parameters

参考:http://stackoverflow.com/questions/7140045/function-pointer-declaration

8.2 函数的变长参数

如何实现函数的变长参数?

在C89中,头文件stdarg.h中定义了一个类型va_list,三个宏va_start, va_arg, va_end

8.2.1 函数变长参数实例1

参考:The C Programming Language, 2nd, Section 7.3

#include <stdio.h>
#include <stdarg.h>

/* minprintf: minimal printf with variable argument list */
void minprintf(char *fmt, ...)
{
    va_list ap; /* points to each unnamed arg in turn */
    char *p, *sval;
    int ival;
    double dval;
    va_start(ap, fmt); /* make ap point to 1st unnamed arg */
    for (p = fmt; *p; p++) {
        if (*p != '%') {
            putchar(*p);
            continue;
        }
        switch (*++p) {
        case 'd':
            ival = va_arg(ap, int);
            printf("%d", ival);
            break;
        case 'f':
            dval = va_arg(ap, double);
            printf("%f", dval);
            break;
        case 's':
            for (sval = va_arg(ap, char *); *sval; sval++)
                putchar(*sval);
            break;
        default:
            putchar(*p);
            break;
        }
    }
    va_end(ap); /* clean up when done */
}

int main() {
    minprintf("%d", 12);
    return 0;
}

8.2.2 函数变长参数实例2

参考:http://www.cprogramming.com/tutorial/c/lesson17.html

#include <stdarg.h>
#include <stdio.h>

/* this function will take the number of values to average
   followed by all of the numbers to average */
double average ( int num, ... )
{
    va_list arguments;
    double sum = 0;

    /* Initializing arguments to store all values after num */
    va_start ( arguments, num );
    /* Sum all the inputs; we still rely on the function caller to tell us how
     * many there are */
    int x;
    for ( x = 0; x < num; x++ )
    {
        sum += va_arg ( arguments, double );
    }
    va_end ( arguments );                  // Cleans up the list

    return sum / num;
}

int main()
{
    /* this computes the average of 13.2, 22.3 and 4.5 (3 indicates the number of values to average) */
    printf( "%f\n", average ( 3, 12.2, 22.3, 4.5 ) );
    /* here it computes the average of the 5 values 3.3, 2.2, 1.1, 5.5 and 3.3 */
    printf( "%f\n", average ( 5, 3.3, 2.2, 1.1, 5.5, 3.3 ) );
}

8.2.3 如何把变长参数传递给其它函数(printf, vprintf区别)

如何把变长参数传递给其它函数?如:怎么给printf写个包装(wrap)函数?

下面的写法是错误的。

void faterror(const char *fmt, ...)
{
    va_list argp;
    va_start(argp, fmt);
    puts("Other message.");
    printf(fmt, argp);        /* WRONG */
    va_end(argp);
    exit(EXIT_FAILURE);
}

正确的写法是用 vprintf 替换上面的printf即可。

参考:http://c-faq.com/varargs/handoff.html

8.3 宏的变长参数

如何让宏也支持变长参数呢?

在C89中无法优雅地实现,C FAQ 10.26中有一些不太好的方法。

在C99中能很好地支持宏的变长参数。
C99 introduces formal support for function-like macros with variable-length argument lists. The notation ... can appear at the end of the macro ``prototype'' (just as it does for varargs functions), and the pseudomacro __VA_ARGS__ in the macro definition is replaced by the variable arguments during invocation.

实例:带变长参数的宏

#define XXX(...) fun1(__VA_ARGS__)

参考:http://c-faq.com/cpp/varargs.html

8.4 Function Pointers(函数指针)

A function pointer is a variable that stores the address of a function that can later be called through that function pointer.

参考:
http://www.cprogramming.com/tutorial/function-pointers.html
http://www.newty.de/fpt/fpt.html#chapter2

8.4.1 函数名隐式转换为函数指针

C语言中, 函数名会隐式地转换为函数的指针。

#include <stdio.h>

void fun1(void){
    printf("this is fun1\n");
}

int main()
{
    printf("%p\n", fun1);
    printf("%p\n", &fun1);     //和上一句输出的地址是相同的。
    //printf("%p\n", &&fun1);  //语法错误。

    fun1();            //输出this is fun1
    (*fun1)();         //输出this is fun1
    (**fun1)();        //输出this is fun1
    (***fun1)();       //输出this is fun1
    return 0;
}

参考:
http://stackoverflow.com/questions/6893285/why-do-all-these-crazy-function-pointer-definitions-all-work-what-is-really-goi
http://stackoverflow.com/questions/840501/how-do-function-pointers-in-c-work

8.4.2 函数指针基本用法

函数指针的基本用法可参见下面实例:

#include <stdio.h>
void my_func(int x)
{
  printf("%d\n", x);
}

int main()
{
  void (*foo)(int);     /* 声明foo为函数指针 */
  foo = &my_func;       /* 由于函数名会转换为函数指针,所以也可简写为 foo = my_func; */

  /* call my_int_func (note that you do not need to write (*foo)(2) ) */
  foo(2);
  /* but if you want to, you may */
  (*foo)(2);

  return 0;
}

8.4.3 函数指针实例——库函数qsort的最后一个参数

库函数qsort的声明如下(最后一个参数是一个函数指针):

#include <stdlib.h>

void qsort(void *base, size_t nmemb, size_t width, int (*compar)(const void *,const void *));
/* 参数:
base 待排序数组首地址
nmemb 数组中待排序元素数量
width 各元素的占用空间大小
compar 指向一个比较函数的指针
*/

说明:qsort的最后一个参数是一个函数指针,这使得qsort有很好的通用性。我们可以对简单的整形数组排序,也可以按结构体的某个字段对结构体数组进行排序。

实例:用qsort从小到大排序double数组

#include <stdio.h>
#include <stdlib.h>

int cmp(const void *x, const void *y)
{
  double xx = *(double*)x, yy = *(double*)y;
  if (xx < yy) return -1;    /* 第1个参数小于第2个参数时返回负数,可按从小到大的顺序排序 */
  if (xx > yy) return  1;
  return 0;
}

int main() {
  double arr[] = {9.03, 5, 1.56, 2, 0.2};
  int num = sizeof(arr)/sizeof(arr[0]);

  qsort(arr, num, sizeof(arr[0]), cmp);

  int i;
  for (i=0; i<num; i++) {
    printf("%f\n", arr[i]);
  }
  return 0;
}
/* 输出:
0.200000
1.560000
2.000000
5.000000
9.030000
*/

8.4.4 函数指针实例——Array of function pointers

#include <stdio.h>

void fun1() { printf("fun1\n"); }
void fun2() { printf("fun2\n"); }
void fun3(int x, int y) { printf("fun3, %d\n", x+y); }

int main() {

  /* 声明并初始化函数指针数组 */
  void (*handlers[3])() = {
    fun1,
    fun2,
    (void (*)())fun3
  };

  /* 调用各个函数 */
  handlers[0]();         /*  也可以这样调用 (*handlers[0])();  */
  (*handlers[1])();
  handlers[2](3, 4);

  return 0;
}

8.4.5 函数指针实例——返回函数指针

下面将演示一个返回函数指针的实例:

#include <stdio.h>

float plus(float a, float b) { return a + b; }
float minus(float a, float b) { return a - b; }

float (*getFunc(const char opCode))(float, float)
{
   if(opCode == '+')
       return plus;           /* 也可写为 return &Plus */
   else if (opCode == '-')
       return minus;
   else
       return NULL;
}

int main() {
    float (*fp)(float, float);

    fp = getFunc('+');
    printf("%f\n", fp(1.5, 1.2));      /* 输出 2.700000 */

    fp = getFunc('-');
    printf("%f\n", (*fp)(1.5, 1.2));   /* 输出 0.300000 */

    return 0;
}

上例中,函数getFunc会返回函数指针,但它的定义太复杂,可用typedef简化为下面更易读的形式:

typedef float (*funcSig)(float, float);

funcSig getFunc(const char opCode) {
   if(opCode == '+')
       return plus;
   else if (opCode == '-')
       return minus;
   else
       return NULL;
}

9 C预处理

参考:
"The C Programming Language, 2nd" 4.11 The C Preprocessor
"ISO&IEC-9899-1999(E)" 6.10 Preprocessing directives
The GNU C Preprocessor: https://gcc.gnu.org/onlinedocs/cpp/

9.1 文件包含 (#include)

Any source line of the form:

#include "filename"   /* 先搜索当前文件所在目录,再搜索系统目录 */

or

#include <filename>   /* 仅搜索系统目录 */

is replaced by the contents of the file filename.

9.2 宏替换 (#define, #undef)

#define token replacement

Each token will be replaced by the replacement text.
Substitutions are made only for tokens, and do not take place within quoted strings.

Names may be undefined with #undef, usually to ensure that a routine is really a function, not a macro:

#undef getchar
int getchar(void) { ... }

9.2.1 Macro with arguments

It is also possible to define macros with arguments, so the replacement text can be different for different calls of the macro.
As an example, define a macro called max:

#define max(A, B) ((A) > (B) ? (A) : (B))

Each occurrence of a formal parameter (here A or B) will be replaced by the corresponding actual argument.

The line:

x = max(p+q, r+s);

will be replaced by the line:

x = ((p+q) > (r+s) ? (p+q) : (r+s));

If you examine the expansion of max, you will notice some pitfalls. The expressions are evaluated twice; this is bad if they involve side effects like increment operators or input and output. For instance

max(i++, j++);  /* WRONG */

will increment the larger twice.

9.2.2 The # operator

A parameter name is preceded by a # in the replacement text, the combination will be expanded into a quoted string with the parameter replaced by the actual argument.

For example, a debugging print macro:

#define dprint(expr) printf(#expr " = %g\n", expr)

When this is invoked, as in

 dprint(x/y);

the macro is expanded into

 printf("x/y" " = &g\n", x/y);

and the strings are concatenated, so the effect is

 printf("x/y = &g\n", x/y);

9.2.3 The ## operator

The preprocessor operator ## provides a way to concatenate actual arguments during macro expansion.
If a parameter in the replacement text is adjacent to a ##, the parameter is replaced by the actual argument, the ## and surrounding white space are removed, and the result is rescanned.

For example, the macro paste concatenates its two arguments:

#define paste(front, back) front ## back

so paste(name, 1) creates the token name1.

9.3 条件包含 (#if)

The #if line evaluates a constant integer expression (which may not include sizeof, casts, or enum constants). If the expression is non-zero, subsequent lines until an #endif or #elif or #else are included.

The #ifdef and #ifndef lines are specialized forms that test whether a name is defined.

9.4 Line control (#line)

A #line directive sets the compiler's setting for the current file name and line number.

Directives #line alter the results of the __FILE__ and __LINE__ predefined macros from that point on.

It's also used by other tools that generate C source code, such as lex/flex and yacc/bison, so that error messages can refer to the input file rather than the (temporary) generated C code.

A line of the form:

#line number

sets the current line number.

A line of the form:

#line number "file-name"

sets both the line number and the file name.

参考:
https://gcc.gnu.org/onlinedocs/cpp/Line-Control.html
http://stackoverflow.com/questions/7109540/line-keyword-in-c

9.5 Error directive (#error)

A preprocessing directive of the form:

#error error-message

causes the implementation to produce a diagnostic message that includes the specified sequence of preprocessing tokens.

9.6 Pragmas (#pragma, _Pragma)

9.6.1 #pragma directive

The #pragma directive is the method specified by the C standard for providing additional information to the compiler, beyond what is conveyed in the language itself.

9.6.2 _Pragma operator

C99 introduces the _Pragma operator. This feature addresses a major problem with #pragma: being a directive, it cannot be produced as the result of macro expansion. _Pragma is an operator, much like sizeof or defined, and can be embedded in a macro.

Its syntax is

_Pragma (string-literal)

where string-literal is destringized, by replacing all '\\' with a single '\' and all '\"' with a '"'.

For example,

_Pragma ("GCC dependency \"parse.y\"")

has the same effect as

#pragma GCC dependency "parse.y".

The same effect could be achieved using macros, for example

#define DO_PRAGMA(x) _Pragma (#x)
DO_PRAGMA (GCC dependency "parse.y")

说明: #pragma 是directive,出现在宏展开中无意义; _Pragma 是操作符,可出现在宏展开中。

10 C预定义的宏和标识符

10.1 C预定义的宏

C中有一些预定义的宏,方便用户程序使用。

Table 10: C预定义宏
C predefined macro 描述
__DATE__ A character string literal of the form "Mmm dd yyyy"
__FILE__ Current source file (a character string literal)
__LINE__ Line number (within the current source file) of the current source line (an integer constant)
__STDC__ The integer constant 1, intended to indicate a conforming implementation
__STDC_HOSTED__ C99中增加。The integer constant 1 if the implementation is a hosted implementation or the integer constant 0 if it is not.
__STDC_VERSION__ C99中增加。The integer constant 199901L.
__TIME__ A character string literal of the form "hh:mm:ss" as in the time generated by the asctime function.

说明:什么是hosted environment?
A hosted environment has the complete facilities of the standard C library available.

下面三个宏在C99中增加,它们的定义由实现决定,编译器如果支持,就要设置。

Table 11: C99中增加的由实现决定的预定义宏
C由实现决定的预定义宏 描述
__STDC_IEC_599__ 若支持IEC 60559浮点运算,则为1
__STDC_IEC_599_COMPLEX__ 若支持IEC 60599复数运算,则为1
__STDC_ISO_10646__ 由编译程序支持,用于说明ISO/IEC 10646标准的年和月格式:yyymmmL

参考:ISO&IEC-9899-1999(E) 6.10.8 Predefinded macro names

10.2 C预定义的标识符

__func__ 是C语言中函数体内预定义的标识符,它会被自动设置为当前函数的名字。
相当于在每个函数定义的第一行就有下面的声明一样。

static const char __func__[] = "function-name";

实例:调用下面函数时会输出函数名myfunc

#include <stdio.h>
void myfunc(void)
{
    printf("%s\n", __func__);
    /* ... */
}

参考:ISO&IEC-9899-1999(E) 6.4.2.2 Predefinded identifiers

11 文本和字符串处理

11.1 strncat

strncat的原型为:

char *strncat(char *dest, const char *src, size_t n);

它能保证dest最后一个字符为'\0'。
但如果src等于或大于n时,它会复制n+1(最后一个为'\0')个字符到dest。

特别说明:strncat并不是为了提供一安全版本的strcat(如果dest不够长,strncat将不安全),而仅仅只是提供“连接字符串src中前n个字符到dest末尾”的功能。

char buf[4] = "ab";
char *src = "1234567890";
strncat(buf, src, 8);     /* 错误用法!不安全,因为buf太小! */

下面用法是错误的,当src很长时,可能会非法写入dest之后的一个字节。
strncat(dest, src, sizeof(dest) - strlen(dest));

strncat的常见用法(这种用法是安全的,但src很长时,可能只会复制src的部分内容到dest末尾):

strncat(dest, src, sizeof(dest) - 1 - strlen(dest));

11.1.1 strncat的简单实现

下面的实现摘自 man strncat

char*
strncat(char *dest, const char *src, size_t n)
{
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0 ; i < n && src[i] != '\0' ; i++) {
        dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';

    return dest;
}

11.2 strncpy, snprintf

strncpy的原型为:

char *strncpy(char *dest, const char *src, size_t n);

把字符串src的前n个字节复制到dest开始的地址空间中,并返回dest。

特别注意,strncpy不会自动添加\0,如:

#include<stdio.h>
#include<string.h>

int main() {

  char a[10] = "aaaaaaaaaa";
  strncpy(a, "01234567890abcd", 5);

  printf("%s\n", a);                /* 会输出 01234aaaaa */
  return 0;
}

strncpy可能不正确的用法(如果dest在使用前已经初始化过,则没有问题):
strncpy(dest, src, sizeof(dest) - 1);
它不能保证dest以0结束,除非调用函数前dest的最后一个字节为0。

strncpy的安全用法:

strncpy(dest, src, sizeof(dest));
dest[sizeof(dest)-1] = '\0';       // 防止安全隐患,请手工把最后一字节填充\0

strncpy说明:
1.如果想要防止溢出,size应该写为sizeof(dest)或sizeof(dest)-1,不可误用sizeof(src)。
2.它有安全隐患,务必要把dest的最后一个字节手工设置为\0。strncpy仅在src的长度小于n时,才会填充n-strlen(str)个字节的\0。
3.性能问题。 当dest长度远大于src时,strncpy(dest, src, sizeof(dest));会对多余的每个字节填\0,会有性能损失。
4.返回值。strncpy返回dest,因而无法知道拷贝了多少个字节。

snprintf也可用来复制字符串。
snprintf的正确用法:

snprintf(dest, sizeof(dest), "%s", src);

snprintf说明:
1.不可省略第三个参数"%s",因为存在隐患: 省略第三个参数时,如果src中包含%,会引发core。
2.性能问题。当src长度远大于dest时,由于snprintf要返回src的字节数,需要扫描src,会有性能损失。
3.返回值。如果当前buf够用,返回实际写入的字符数;如果不够用,返回将要写入的字符数。

strncpy和snprintf总结:
1.snprintf使用比strncpy简洁。
2.snprintf可以获取被拷贝的字节数。
3.二者都有性能问题。
4.strncpy不安全!snprintf安全(能保证在字符串结尾一定有'\0')。

参考:
http://www.jb51.net/article/39994.htm

11.2.1 strncpy的简单实现

下面的实现摘自 man strncpy

char*
strncpy(char *dest, const char *src, size_t n){
    size_t i;

    for (i = 0 ; i < n && src[i] != '\0' ; i++) {
        dest[i] = src[i];
    }
    for ( ; i < n ; i++) {
        dest[i] = '\0';
    }

    return dest;
}

11.2.2 snprintf返回值

snprintf原型为:

int snprintf(char *str, size_t size, const char *format, ...);

snprintf最多写入size个字节(包含最后一个\0,所以有效的字节数仅为size - 1); 它的返回值是将要写入的字节数。
可以用snprintf的返回值测试输出有没有被截断——如果返回值等于或者大于第二个参数size值,则说明输出被截断!

char abc[20] = "1234567";
printf("snprintf return %d\n", snprintf(abc, 10, "xyz_%s", "12345678"));
// 上面语句会输出 "snprintf return 12",12即xyz_12345678的长度
printf("abc is %s\n",abc);
// 上面语句会输出 "abc is xyz_12345",仅输出9个字节,第10个字节为\0

11.3 strspn, strcspn, strpbrk

strspn stands for string span
strcspn stands for string complement span
strpbrk stands for string pointer break

参考:http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Search-Functions.html

11.3.1 strspn

strspn原型:

size_t strspn(const char *s1, const char *s2);

The strspn() function returns the number of bytes in the initial segment of s1 which consist only of bytes from s2.

strspn实例:

#include <string.h>
#include <stdio.h>
int main()
{
    printf("%lu\n",strspn("LinuxLinux","niLux"));   // 输出 10
    printf("%lu\n",strspn("LinuxLinux","Lkix"));    // 输出 2
    printf("%lu\n",strspn("aLinuxLinux","Lkix"));   // 输出 0
    return 0;
}

11.3.2 strcspn

strcspn原型:

size_t strcspn(const char *s1, const char *s2);

The strcspn() function returns the number of bytes in the initial segment of s1 which are not in the string s2.

#include <stdio.h>
#include <string.h>
int main()
{
    char *s="Golden Global View";
    char *r="new";
    int n=strcspn(s,r);
    printf("The first char both in s1 and s2 is: %c\n",s[n]); // The first char both in s1 and s2 is: e
    return 0;
}

参考:http://baike.baidu.com/view/1028539.htm

11.3.3 strpbrk

strpbrk原型:

char *strpbrk(const char *s1, const char *s2);

The strpbrk() function returns a pointer to the byte in s1 that matches one of the bytes in s2, or NULL if no such byte is found.

11.4 strtok

strtok用于按指定的分隔符把某字符串分解为一组字符串。

在第一次调用时,strtok必需给予参数s字符串,往后的调用则将参数s设置成NULL。每次调用成功则返回指向被分割出片段的指针。

注:strtok是一个线程不安全的函数,因为它使用了静态分配的空间来存储被分割的字符串位置。 strtok_r 函数是strtok函数的可重入版本。

11.4.1 strtok例子1

#include <stdio.h>
#include <string.h>

int main()
{
    char str[] = "Life is like, a box of chocolate, you never, know what you're, gonna get";
    char delims[] = ",";
    char *result;

    result = strtok( str, delims );
    while(result != NULL){
        printf("%s \n", result);
        result = strtok( NULL, delims);
    }
    return 0;
}

11.4.2 strtok例子2

#include <stdio.h>
#include <string.h>

int main()
{
    char str[] = "Life is like, a box of chocolate, you never, know what you're, gonna get";
    char delims[] = ",";
    char *result;

    for( result = strtok(str, delims); result != NULL; result = strtok(NULL, delims))
        printf("%s \n", result);
    return 0;
}

11.4.3 strtok函数会破坏待分解的字符串

strtok调用前和调用后的第一个参数已经不一样了。

例子:

#include<string.h>
#include<stdio.h>
int main(void)
{
    char input[16]="abc,d";
    char*p;
    printf("befor strtok. input is %s\n", input);

    p=strtok(input,",");
    if(p)
        printf("%s\n",p);
    printf("after strtok. input is %s\n", input);

    p=strtok(NULL,",");  //如果第二次调用strtok时第一个参数没有设置为NULL,即p=strtok(input,","); 则它返回还是abc
    if(p)
        printf("%s\n",p);
    printf("after strtok. input is %s\n", input);

    return 0;
}

上面程序的输出为:
befor strtok. input is abc,d
abc
after strtok. input is abc
d
after strtok. input is abc

11.4.4 strtok只会返回非空字符串(skip over empty fields)

strtok处理时,连续的空field会被自动忽略。

The tokens returned by strtok() are always nonempty strings.
Thus, for example, given the string "aaa;;bbb,", successive calls to strtok() that specify the delimiter string ";," would return the strings "aaa" and "bbb", and then a NULL pointer.

11.4.5 strsep (可移植性不好)

strsep比strtok更好,这是线程安全的。唯一不足的是可移植性不如strtok。

The strsep() function was introduced as a replacement for strtok(), since the latter cannot handle empty fields. However, strtok() conforms to C89/C99 and hence is more portable.

11.4.6 strtok_r

strtok不是线程安全的,strtok_r是线程安全的。

strtok_r实现实例:

#include <string.h>
/* Parse s into tokens separated by characters in delim.
   If s is NULL, the saved pointer in save_PTR is used as
   the next starting point.  For example:
     char s[] = "-abc-=-def";
     char *sp;
     x = strtok_r(s, "-", &sp);      // x = "abc", sp = "=-def"
     x = strtok_r(NULL, "-=", &sp);  // x = "def", sp = NULL
     x = strtok_r(NULL, "=", &sp);   // x = NULL
                                     // s = "abc\0-def\0"
*/
char *strtok_r(char *s, const char *delim, char **save_ptr) {
    char *token;

    if (s == NULL) s = *save_ptr;

    /* Scan leading delimiters.  */
    s += strspn(s, delim);
    if (*s == '\0')
        return NULL;

    /* Find the end of the token.  */
    token = s;
    s = strpbrk(token, delim);
    if (s == NULL)
        /* This token finishes the string.  */
        *save_ptr = strchr(token, '\0');
    else {
        /* Terminate the token and make *SAVE_PTR point past it.  */
        *s = '\0';
        *save_ptr = s + 1;
    }

    return token;
}

参考:http://blog.csdn.net/sjin_1314/article/details/8242098

11.4.6.1 保留空字段版本的strtok_r

下面实现一个保留空字段版本的strtok_r:

/*
 * This function like strtok_r, but reserve empty fields(strtok_r skip empty fields).
 *
 * For example:
 * strtok_r:         a;;;d => "a", "d"
 * this function:    a;;;c => "a", "", "", "d"
 */
char *strtok_r_no_skip(char *str, const char *delims, char **store) {

    char *ret;

    if (str == NULL)
        str = *store;

    if (*str == '\0')
        return NULL;

    ret = str;

    str += strcspn(str, delims);

    if (*str != '\0')
        *str++ = '\0';

    *store = str;

    return ret;
}

11.5 memcpy

memcpy从内存地址src处复制n字节到地址dest处。

#include <string.h>
void *memcpy(void *dest, const void *src, size_t n);

                                 /* returns a pointer to dest. */

说明:如果src和dest内存有重叠(这时应该使用memmove),memcpy能否正常工作取决于具体的实现,不过很多memcpy实现都能正确处理内存重叠的情况。

11.5.1 memcpy最基本实现

下面是memcpy的一个基本实现,当src和dest内存重叠时它可能出错。

void * memcpy (void * dest, const void *src, size_t n) {
  char *pDest = (char *) dest;
  const char *pSrc = (const char *) src;

  size_t i=0;
  for (; i < n; i++) {
    pDest[i] = pSrc[i];
  }

  return dest;
}

11.5.2 memcpy实现(能处理重叠内存)

下面是memcpy一个实现,它能处理src和dest内存有重叠的情况。

void * memcpy (void * dest, const void *src, size_t n) {
  char *pDest = (char *) dest;
  const char *pSrc = (const char *) src;

  size_t i=0;
  if ( (unsigned long)pDest < (unsigned long)src ) { /* Copy forward */
    for (; i < n; i++) {
      pDest[i] = pSrc[i];
    }
  } else {                                           /* Copy backward */
    for (i=n-1; i<=0; i--) {
      pDest[i] = pSrc[i];
    }
  }

  return dest;
}

11.5.3 memcpy的更快实现(一次复制多字节)

下面的实例一次复制4个字节,最后不足4字节部分按字节复制。

// 暂时没有考虑对齐的问题。
void memcpy(void* dest, void* src, int size)
{
  uint8_t *pdest = (uint8_t*) dest;
  uint8_t *psrc = (uint8_t*) src;

  int loops = (size / sizeof(uint32_t));
  int index;
  for(index = 0; index < loops; ++index) {
    *((uint32_t*)pdest) = *((uint32_t*)psrc);
    pdest += sizeof(uint32_t);
    psrc += sizeof(uint32_t);
  }

  loops = (size % sizeof(uint32_t));
  for (index = 0; index < loops; ++index) {
    *pdest = *psrc;
    ++pdest;
    ++psrc;
  }
}

参考:
http://stackoverflow.com/questions/11876361/implementing-own-memcpy-size-in-bytes
http://opensource.apple.com//source/xnu/xnu-2050.18.24/libsyscall/wrappers/memcpy.c
http://codereview.stackexchange.com/questions/41094/memcpy-implementation

11.6 去掉前后空格

// Note: This function returns a pointer to a substring of the original string.
// If the given string was allocated dynamically, the caller must not overwrite
// that pointer with the returned value, since the original pointer must be
// deallocated using the same allocator with which it was allocated.  The return
// value must NOT be deallocated using free() etc.
char *trimwhitespace(char *str)
{
  char *end;

  // Trim leading space
  while(isspace(*str))
     str++;

  if(*str == 0)  // All spaces?
    return str;

  // Trim trailing space
  end = str + strlen(str) - 1;
  while(end > str && isspace(*end))
    end--;

  // Write new null terminator
  *(end+1) = 0;

  return str;
}

参考:http://stackoverflow.com/questions/122616/how-do-i-trim-leading-trailing-whitespace-in-a-standard-way

11.7 读写文本文件

11.7.1 处理单个字符

处理单个字符常用函数:

int fgetc(FILE *stream);
int getc(FILE *stream);         //和fgetc相同,但可能用宏实现
int getchar(void);              //等于getc(stdin)
int ungetc(int c, FILE *stream);

int fputc(int c, FILE *stream);
int putc(int c, FILE *stream);  //和fputc相同,但可能用宏实现
int putchar(int c);             //等于putc(c, stdout)

fgetc/fputc和getc/putc有什么不同呢?
参考:http://stackoverflow.com/questions/14008907/fputc-vs-putc-in-c

11.7.2 处理多个或一行字符

处理多个或一行字符常用函数:

char *fgets(char *s, int size, FILE *stream); //当读到换行符或EOF时fgets会结束,以换行符结束时包含换行符,
                                              //fgets最多读size-1个字符(因为它以'\0'结束)。
char *gets(char *s);      //从stdin中得到一行,注意它得到的字符串中不含换行符

int fputs(const char *s, FILE *stream);
int puts(const char *s);   //输出一行到stdout,注意它会增加trailing newline

例子:一行一行读文本文件并输出(假设每行不超过2014字节)

#include <stdio.h>

int main() {
    char line[1024];
    FILE *fp = fopen("filename.txt","r");
    if( fp == NULL ) {
        return 1;
    }
    while( fgets(line,1024,fp) ) {
        printf("%s\n",line);
    }
    return 0;
}

11.7.3 fflush vs fsync

#include <stdio.h>
int fflush(FILE *stream);

#include <unistd.h>
int fsync(int fd);

fflush() works on FILE* , it just flushes the internal buffers in the FILE* of your application out to the OS.
fsync() works on a lower level, it tells the OS to flush its buffers to the physical media.

要关闭一个FILE,并使文件内容立刻写入到硬盘中,可以这样做:

fflush(gfile);
tmp_fd = fileno(gfile);
if (tmp_fd != -1) {
    fsync(tmp_fd);
}
fclose(gfile);

参考:http://stackoverflow.com/questions/2340610/difference-between-fflush-and-fsync

12 强符号和弱符号

在C语言中,函数和初始化的全局变量(包括初始化为0)是强符号,未初始化的全局变量是弱符号。

对于强弱符号,有三条规则:
① 同名的强符号只能有一个,否则编译器报"重定义"错误。
② 允许一个强符号和多个弱符号,但定义会选择强符号的。
③ 当有多个弱符号相同时,链接器选择占用内存空间最大的那个。

参考:
http://blog.csdn.net/astrotycoon/article/details/8008629

13 Standard Library (C90)

Reference: "The C Programming Language, 2nd" Appendix B - Standard Library

13.1 Input and Output: <stdio.h>

See table 12, table 13, table 14, table 15, table 16, table 17.

For a summary of input and output functions, please refers to man stdio

Table 12: File Operations
Function Description
FILE *fopen(const char *filename, const char *mode) fopen opens the named file, and returns a stream, or NULL if the attempt fails.
FILE *freopen(const char *filename, const char *mode, FILE *stream) freopen opens the file with the specified mode and associates the stream with it. freopen is normally used to change the files associated with stdin, stdout, or stderr.
int fflush(FILE *stream) On an output stream, fflush causes any buffered but unwritten data to be written; on an input stream, the effect is undefined. fflush(NULL) flushes all output streams.
int fclose(FILE *stream) fclose flushes any unwritten data for stream, discards any unread buffered input, frees any automatically allocated buffer, then closes the stream.
int remove(const char *filename) remove removes the named file.
int rename(const char *oldname, const char *newname) rename changes the name of a file.
FILE *tmpfile(void) tmpfile creates a temporary file of mode "wb+" that will be automatically removed when closed or when the program terminates normally.
char *tmpnam(char s[L_tmpnam]) tmpnam(NULL) creates a string that is not the name of an existing file, and returns a pointer to an internal static array. tmpnam(s) stores the string in s as well as returning it as the function value.
int setvbuf(FILE *stream, char *buf, int mode, size_t size) setvbuf controls buffering for the stream; it must be called before reading, writing or any other operation. A mode of _IOFBF causes full buffering, _IOLBF line buffering of text files, and _IONBF no buffering. If buf is not NULL, it will be used as the buffer, otherwise a buffer will be allocated. size determines the buffer size.
void setbuf(FILE *stream, char *buf) If buf is NULL, buffering is turned off for the stream. Otherwise, setbuf is equivalent to (void) setvbuf(stream, buf, _IOFBF, BUFSIZ).
Table 13: Formatted Output/Input
Function Description
int fprintf(FILE *stream, const char *format, …) Converts and writes output to stream under the control of format.
int printf(const char *format, …) Equivalent to fprintf(stdout, …).
int sprintf(char *s, const char *format, …) Same as printf except that the output is written into the string s.
int vprintf(const char *format, va_list arg) See stdarg(3).
int vfprintf(FILE *stream, const char *format, va_list arg) See stdarg(3).
int vsprintf(char *s, const char *forma, va_list arg) See stdarg(3).
int fscanf(FILE *stream, const char *format, …) Reads from stream under control of format.
int scanf(const char *format, …) Identical to fscanf(stdin, …)
int sscanf(const char *s, const char *format, …) Same as scanf except input is taken from string s.
Table 14: Character Input and Output Functions
Function Description
int fgetc(FILE *stream) fgetc returns the next character of stream as an unsigned char (converted to an int), or EOF if end of file or error occurs.
char *fgets(char *s, int n, FILE *stream) fgets reads at most the next n-1 characters into the array s, stopping if a newline is encountered; the newline is included in the array
int fputc(int c, FILE *stream) fputc writes the character c (converted to an unsigend char) on stream.
int fputs(const char *s, FILE *stream) fputs writes the string s (which need not contain \n) on stream.
int getc(FILE *stream) getc is equivalent to fgetc except that if it is a macro, it may evaluate stream more than once.
int getchar(void) getchar is equivalent to getc(stdin).
char *gets(char *s) gets reads the next input line into the array s; it replaces the terminating newline with '\0'.
int putc(int c, FILE *stream) putc is equivalent to fputc except that if it is a macro, it may evaluate stream more than once.
int putchar(int c) putchar(c) is equivalent to putc(c,stdout).
int puts(const char *s) puts writes the string s and a newline to stdout.
int ungetc(int c, FILE *stream) ungetc pushes c (converted to an unsigned char) back onto stream, where it will be returned on the next read.
Table 15: Direct Input and Output Functions
Function Description
size_t fread(void *ptr, size_t size, size_t nobj, FILE *stream) fread reads from stream into the array ptr at most nobj objects of size size.
size_t fwrite(const void *ptr, size_t size, size_t nobj, FILE *stream) fwrite writes, from the array ptr, nobj objects of size size on stream.
Table 16: File Positioning Functions
Function Description
int fseek(FILE *stream, long offset, int origin) fseek sets the file position for stream.
long ftell(FILE *stream) ftell returns the current file position for stream, or -1 on error.
void rewind(FILE *stream) rewind(fp) is equivalent to fseek(fp, 0L, SEEK_SET); clearerr(fp).
int fgetpos(FILE *stream, fpos_t *ptr) fgetpos records the current position in stream in *ptr, for subsequent use by fsetpos.
int fsetpos(FILE *stream, const fpos_t *ptr) fsetpos positions stream at the position recorded by fgetpos in *ptr.
Table 17: Error Functions
Function Description
void clearerr(FILE *stream) clearerr clears the end of file and error indicators for stream.
int feof(FILE *stream) feof returns non-zero if the end of file indicator for stream is set.
int ferror(FILE *stream) ferror returns non-zero if the error indicator for stream is set.
void perror(const char *s) prints message corresponding to errno, likes fprintf(stderr, "%s: %s\n", s, "error message");

13.2 Character Class Test: <ctype.h>

See table 18.

Table 18: Character Class Test: <ctype.h>
Function Description
isalnum(c) isalpha(c) or isdigit(c) is true
isalpha(c) isupper(c) or islower(c) is true
iscntrl(c) control character
isdigit(c) decimal digit
isgraph(c) printing character except space
islower(c) lower-case letter
isprint(c) printing character including space
ispunct(c) printing character except space or letter or digit
isspace(c) space, formfeed, newline, carriage return, tab, vertical tab
isupper(c) upper-case letter
isxdigit(c) hexadecimal digit
int tolower(c) convert c to lower case
int toupper(c) convert c to upper case

13.3 String Functions: <strings.h>

See table 19.

Table 19: String Functions
Function Description
char *strcpy(s,ct) copy string ct to string s, including '\0'; return s.
char *strncpy(s,ct,n) copy at most n characters of string ct to s; return s. Pad with '\0''s if ct has fewer than n characters.
char *strcat(s,ct) concatenate string ct to end of string s; return s.
char *strncat(s,ct,n) concatenate at most n characters of string ct to string s, terminate s with '\0'; return s.
int strcmp(cs,ct) compare string cs to string ct, return <0 if cs<ct, 0 if cs==ct, or >0 if cs>ct.
int strncmp(cs,ct,n) compare at most n characters of string cs to string ct; return <0 if cs<ct, 0 if cs==ct, or >0 if cs>ct.
char *strchr(cs,c) return pointer to first occurrence of c in cs or NULL if not present.
char *strrchr(cs,c) return pointer to last occurrence of c in cs or NULL if not present.
size_t strspn(cs,ct) return length of prefix of cs consisting of characters in ct.
size_t strcspn(cs,ct) return length of prefix of cs consisting of characters not in ct.
char *strpbrk(cs,ct) return pointer to first occurrence in string cs of any character string ct, or NULL if not present.
char *strstr(cs,ct) return pointer to first occurrence of string ct in cs, or NULL if not present.
size_t strlen(cs) return length of cs.
char *strerror(n) return pointer to implementation-defined string corresponding to error n.
char *strtok(s,ct) strtok searches s for tokens delimited by characters from ct.
void *memcpy(s,ct,n) copy n characters from ct to s, and return s.
void *memmove(s,ct,n) same as memcpy except that it works even if the objects overlap.
int memcmp(cs,ct,n) compare the first n characters of cs with ct; return as with strcmp.
void *memchr(cs,c,n) return pointer to first occurrence of character c in cs, or NULL if not present among the first n characters
void *memset(s,c,n) place character c into first n characters of s, return s.

13.4 Utility Functions: <stdlib.h>

See table 20.

Table 20: Utility Functions
Function Description
double atof(const char *s) atof converts s to double; it is equivalent to strtod(s, (char**)NULL).
int atoi(const char *s) converts s to int; it is equivalent to (int)strtol(s, (char**)NULL, 10).
long atol(const char *s) converts s to long; it is equivalent to strtol(s, (char**)NULL, 10).
double strtod(const char *s, char **endp) strtod converts the prefix of s to double, ignoring leading white space; it stores a pointer to any unconverted suffix in *endp unless endp is NULL.
long strtol(const char *s, char **endp, int base) strtol converts the prefix of s to long, ignoring leading white space; it stores a pointer to any unconverted suffix in *endp unless endp is NULL.
unsigned long strtoul(const char *s, char **endp, int base) strtoul is the same as strtol except that the result is unsigned long and the error value is ULONG_MAX.
int rand(void) rand returns a pseudo-random integer in the range 0 to RAND_MAX, which is at least 32767.
void srand(unsigned int seed) srand uses seed as the seed for a new sequence of pseudo-random numbers. The initial seed is 1.
void *calloc(size_t nobj, size_t size) calloc returns a pointer to space for an array of nobj objects, each of size size, or NULL if the request cannot be satisfied. The space is initialized to zero bytes.
void *malloc(size_t size) malloc returns a pointer to space for an object of size size, or NULL if the request cannot be satisfied. The space is uninitialized.
void *realloc(void *p, size_t size) realloc changes the size of the object pointed to by p to size.
void free(void *p) free deallocates the space pointed to by p; it does nothing if p is NULL.
void abort(void) abort causes the program to terminate abnormally, as if by raise(SIGABRT).
void exit(int status) exit causes normal program termination.
int atexit(void (*fcn)(void)) atexit registers the function fcn to be called when the program terminates normally.
int system(const char *s) system passes the string s to the environment for execution.
char *getenv(const char *name) getenv returns the environment string associated with name, or NULL if no string exists.
void *bsearch(const void *key, const void *base, size_t num, size_t size, int (*cmp)(const void *, const void *)); Searches the given key in the array pointed to by base (which is formed by num elements, each of size bytes), and returns a void* pointer to a matching element, if found.
void qsort(void *base, size_t n, size_t size, int (*cmp)(const void *, const void *)) Sorts the num elements of the array pointed to by base, each element size bytes long, using the compar function to determine the order.
int abs(int n) abs returns the absolute value of its int argument.
long labs(long n) labs returns the absolute value of its long argument.
div_t div(int num, int denom) div computes the quotient and remainder of num/denom. The results are stored in the int members quot and rem of a structure of type div_t.
ldiv_t ldiv(long num, long denom) ldiv computes the quotient and remainder of num/denom. The results are stored in the long members quot and rem of a structure of type ldiv_t.

13.5 Non-local Jumps: <setjmp.h>

goto语句只能在一个函数的内部跳转,可称为local jumps。用setjmp, longjmp可实现non-local jumps。

The declarations in <setjmp.h> provide a way to avoid the normal function call and return sequence, typically to permit an immediate return from a deeply nested function call.

int setjmp(jmp_buf env)
    The macro setjmp saves state information in env for use by longjmp. The return is zero from a direct call of setjmp, and non-zero from a subsequent call of longjmp. A call to setjmp can only occur in certain contexts, basically the test of if, switch, and loops, and only in simple relational expressions.
    if (setjmp(env) == 0)
        /* get here on direct call */
    else
        /* get here by calling longjmp */

void longjmp(jmp_buf env, int val)
    longjmp restores the state saved by the most recent call to setjmp, using the information saved in env, and execution resumes as if the setjmp function had just executed and returned the non-zero value val. The function containing the setjmp must not have terminated. Accessible objects have the values they had at the time longjmp was called, except that non-volatile automatic variables in the function calling setjmp become undefined if they were changed after the setjmp call.

setjmp, longjmp实例:

/* setjmp example: error handling */
#include <stdio.h>      /* printf, scanf */
#include <stdlib.h>     /* exit */
#include <setjmp.h>     /* jmp_buf, setjmp, longjmp */

int main()
{
  jmp_buf env;
  int val;

  val = setjmp (env);
  if (val) {
    fprintf (stderr, "Error %d happened\n", val);
    exit (val);
  }

  /* code here */

  longjmp (env, 101);   /* signaling an error when something wrong. 会跳到之前调用setjmp的位置处执行 */

  return 0;
}

上面程序会输出:

Error 101 happened

参考:http://www.cplusplus.com/reference/csetjmp/setjmp/

13.5.1 实现setjmp, longjmp

setjmp, longjmp的实现依赖于具体的平台。

FreeBSD在x86-64平台对setjmp和longjmp的实现如下:

/*****************************************************************************/
/* setjump, longjump                                                         */
/*****************************************************************************/

ENTRY(setjmp)
        movq    %rbx,0(%rdi)                    /* save rbx */
        movq    %rsp,8(%rdi)                    /* save rsp */
        movq    %rbp,16(%rdi)                   /* save rbp */
        movq    %r12,24(%rdi)                   /* save r12 */
        movq    %r13,32(%rdi)                   /* save r13 */
        movq    %r14,40(%rdi)                   /* save r14 */
        movq    %r15,48(%rdi)                   /* save r15 */
        movq    0(%rsp),%rdx                    /* get return address */
        movq    %rdx,56(%rdi)                   /* save return address */
        xorl    %eax,%eax                       /* return(0); */
        ret
END(setjmp)

ENTRY(longjmp)
        movq    0(%rdi),%rbx                    /* restore rbx */
        movq    8(%rdi),%rsp                    /* restore rsp */
        movq    16(%rdi),%rbp                   /* restore rbp */
        movq    24(%rdi),%r12                   /* restore r12 */
        movq    32(%rdi),%r13                   /* restore r13 */
        movq    40(%rdi),%r14                   /* restore r14 */
        movq    48(%rdi),%r15                   /* restore r15 */
        movq    56(%rdi),%rdx                   /* get return address */
        movq    %rdx,0(%rsp)                    /* restore return address */
        xorl    %eax,%eax                       /* return(1); */
        incl    %eax
        ret
END(longjmp)

All setjmp is doing is saving a bunch of registers including %rsp (the stack pointer) and the return address into the jmp_buf array that was passed as a parameter. All longjmp is doing is restoring those registers and the return address of the original setjmp call.

x86-64有16个通用寄存器,在上面的实现中setjmp仅保存了7个寄存器(callee-save registers),为什么其它的寄存器不用保存以待恢复呢?
Well recall that setjmp and longjmp are implemented as functions and as such they follow the standard x86-64 calling convention. The x86-64 calling convention dictates that the 7 registers above are owned by the caller (also known as callee-save), which means that setjmp is responsible for restoring these registers before it returns. The other registers are owned by the callee function, the callee is allowed to clobber them however it likes before returning. So we're under no obligation to restore these registers before returning, and the caller can't make any assumptions about what they will contain.

参考:
http://svnweb.freebsd.org/base/head/sys/amd64/amd64/support.S?revision=249439&view=markup#l657
http://blog.reverberate.org/2013/05/deep-wizardry-stack-unwinding.html

13.5.2 setjmp,longjmp可能导致内存泄露

setjmp和longjmp之间的栈空间在调用longjmp时被丢弃了,这可能会导致内存泄露。

void f1(void) {
    char *p = malloc(1024);

    f2();

    free(p);     /* never called, memory leak! */
}

void f2(void) {
    longjmp(env, 1);
}

13.6 Signals: <signal.h>

See table 20.

Table 21: Signals
Function Description
void (*signal(int sig, void (*handler)(int)))(int) signal determines how subsequent signals will be handled.
int raise(int sig) raise sends the signal sig to the program; it returns non-zero if unsuccessful.

13.7 Date and Time Functions: <time.h>

See table 22.

Table 22: Date and Time Functions
Function Description
clock_t clock(void) clock returns the processor time used by the program since the beginning of execution.
time_t time(time_t *tp) time returns the current calendar time or -1 if the time is not available.
double difftime(time_t time2, time_t time1) difftime returns time2-time1 expressed in seconds.
time_t mktime(struct tm *tp) mktime converts the local time in the structure *tp into calendar time.
char *asctime(const struct tm *tp) asctime converts time into a string of the form "Sun Jan 3 15:14:13 1988\n\0"
char *ctime(const time_t *tp) It is equivalent to asctime(localtime(tp))
struct tm *gmtime(const time_t *tp) gmtime converts the calendar time *tp into Coordinated Universal Time (UTC).
struct tm *localtime(const time_t *tp) localtime converts the calendar time *tp into local time.
size_t strftime(char *s, size_t smax, const char *fmt, const struct tm *tp) strftime formats date and time information from *tp into s according to fmt.

struct tm is defined in time.h

struct tm {
	int	tm_sec;		/* seconds after the minute [0-60] */
	int	tm_min;		/* minutes after the hour [0-59] */
	int	tm_hour;	/* hours since midnight [0-23] */
	int	tm_mday;	/* day of the month [1-31] */
	int	tm_mon;		/* months since January [0-11] */
	int	tm_year;	/* years since 1900 */
	int	tm_wday;	/* days since Sunday [0-6] */
	int	tm_yday;	/* days since January 1 [0-365] */
	int	tm_isdst;	/* Daylight Savings Time flag */
	long tm_gmtoff;	/* offset from CUT in seconds */
	char *tm_zone;	/* timezone abbreviation */
};

实例1:计算程序中执行某段代码的CPU时间

#include <stdio.h>
#include <time.h>

int main() {
    clock_t begin = clock();
    // Do stuff
    clock_t end = clock();
    double elapsed = (double)(end - begin) * 1000.0 / CLOCKS_PER_SEC;
    printf("CPU Time elapsed in milliseconds: %f", elapsed);
}

参考:http://www.gnu.org/software/libc/manual/html_node/CPU-Time.html

实例2:计算程序中执行某段代码的时间

#include <time.h>

int main() {
    time_t start, end;
    time(&start);
    // Do stuff
    time(&end);
    double duration = difftime(end, start);
    printf("Time elapsed in seconds: %f", elapsed);
}

13.8 Diagnostics: <assert.h>

void assert(int expression)
If expression is zero, the assert macro will print on stderr a message, such as

Assertion failed: expression, file filename, line nnn

If NDEBUG is defined at the time <assert.h> is included, the assert macro is ignored.

14 C FAQs

14.1 Library Functions

14.1.1 怎么在C中处理正则表达式和通配符

14.1.1.1 POSIX regex in C

Unix-like系统中一般提供了POSIX正则处理的相关库。

#include <sys/types.h>
#include <regex.h>

int regcomp(regex_t *preg, const char *regex, int cflags);
    /* Prepare your regex for fast processing */

int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags);
    /* Do the matching */

void regfree(regex_t *preg);
    /* Free your compiled regex for a new "compiling" */

size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
    /* Retrieve some more information on why the regexec() failed */

POSIX regex处理实例:

#include <stdio.h>

#include <sys/types.h>
#include <regex.h>

int main(int argc, char *argv[])
{
    regex_t regex;
    int reti;
    char msgbuf[100];

    /* Compile regular expression */
    reti = regcomp(&regex, "^a[[:alnum:]]", 0);
    if( reti ) { fprintf(stderr, "Could not compile regex\n"); exit(1); }

    /* Execute regular expression */
    reti = regexec(&regex, "abc", 0, NULL, 0);
    if( !reti ) {
            puts("Match");
    } else if( reti == REG_NOMATCH ) {
            puts("No match");
    } else {
            regerror(reti, &regex, msgbuf, sizeof(msgbuf));
            fprintf(stderr, "Regex match failed: %s\n", msgbuf);
            exit(1);
    }

    /* Free compiled regular expression if you want to use the regex_t again */
    regfree(&regex);

    return 0;
}

参考:
http://www.peope.net/old/regex.html
http://www.gnu.org/software/libc/manual/html_node/Regular-Expressions.html

14.1.1.2 通配符匹配

在通配符中,?表示1个任意字符,*表示0个或多个任意字符。

Here is a quick little wildcard matcher by Arjan Kenter:

int match(char *pat, char *str)
{
    switch(*pat) {
    case '\0':  return !*str;
    case '*':   return match(pat+1, str) || *str && match(pat, str+1);
    case '?':   return *str && match(pat+1, str+1);
    default:    return *pat == *str && match(pat+1, str+1);
    }
}

/* (Copyright 1995, Arjan Kenter) */
/* With this definition, the call match("a*b.c", "aplomb.c") would return 1. */

14.1.2 处理命令行参数 (getopt)

POSIX系统中有个 getopt 函数可以处理命令行参数,它要求程序的命令行参数符合下面约定:

  1. 每个选项仅为单个字母或数字;
  2. 所有选项以连字符'-'开始。

getopt的原型如下:

#include <unistd.h>
int getopt(int argc, char * const argv[], const char *optstring);

extern char *optarg;
extern int optind;
extern int optopt;
extern int opterr;

参数argc和argv与传给main函数的相同,参数optstring是包含所有支持选项的字符串,如果一个选项字符后面紧跟着一个冒号,那么这个选项带一个参数;否则,只是一个开关选项。
例如,如果一个命令的用法为:

command [-i] [-u username] [-z] filename

则应该将"iu:z"作为optstring传给getopt。

getopt的通常用法是一个循环,当getopt返回-1时结束循环。
在每次循环中,getopt会返回下一个处理的选项。当遇到不合法的选项,或者带参数选项的参数缺失时,getopt会返回一个问号'?'。

命令行中的'–'会让getopt停止处理并返回-1。例如,删除文件名为-bar的文件:

$ rm -- -bar   # right
$ rm -bar      # wrong

getopt支持四个外部变量:

optarg
如果选项带有参数,处理该选项时,getopt将optarg指向该选项对应的参数字符串。
opterr
设置opterr为0时,可以禁止getopt在遇到错误选项时输出日志。
optind
下一个要处理的字符串的argv数组的下标。默认从1开始(即忽略分析 argv[0] ),当参数被getopt处理后,optind相应增加。
optopt
如果在处理时遇到错误,getopt将optopt指向引起错误的选项字符串。

注:如果要处理选项不为单个字母的情况(如–version),可以使用函数 getopt_long

参考:
Advanced Programming in the UNIX Environment, 2nd Edition. page 773.
Advanced Programming in the UNIX Environment, 3rd Edition. page 662.

14.1.2.1 getopt实例

下面程序(testopt)将演示getopt的典型使用场景:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char **argv)
{
  int aflag = 0;
  int bflag = 0;
  char *cvalue = NULL;
  int index;
  int c;

  opterr = 0;
  while ((c = getopt (argc, argv, "abc:")) != -1) {
    switch (c) {
      case 'a':
        aflag = 1;
        break;
      case 'b':
        bflag = 1;
        break;
      case 'c':
        cvalue = optarg;
        break;
      case '?':
        if (optopt == 'c')
          fprintf (stderr, "Option -%c requires an argument.\n", optopt);
        else if (isprint (optopt))
          fprintf (stderr, "Unknown option `-%c'.\n", optopt);
        else
          fprintf (stderr,
                   "Unknown option character `\\x%x'.\n",
                   optopt);
        return 1;
      default:
        abort ();
    }
  }
  printf ("aflag = %d, bflag = %d, cvalue = %s\n",
          aflag, bflag, (cvalue == NULL) ? "null" : cvalue);

  for (index = optind; index < argc; index++)
    printf ("Non-option argument %s\n", argv[index]);
  return 0;
}

测试实例:

$ ./testopt -abc test file1 file2
aflag = 1, bflag = 1, cvalue = test
Non-option argument file1
Non-option argument file2

参考:http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Example-of-Getopt.html

14.1.2.2 实现getopt

下面是getopt的一个实现,来自http://note.sonots.com/Comp/CompLang/cpp/getopt.html

#include <stdio.h>

#define ERR(s, c)   if(opterr){\
    char errbuf[2];\
    errbuf[0] = c; errbuf[1] = '\n';\
    fputs(argv[0], stderr);\
    fputs(s, stderr);\
    fputc(c, stderr);}

int opterr = 1;
int optind = 1;
int optopt;
char *optarg;

int
getopt(int argc, char **argv, char *opts)
{
    static int sp = 1;
    register int c;
    register char *cp;

    if(sp == 1)
        if(optind >= argc ||
           argv[optind][0] != '-' || argv[optind][1] == '\0')
            return(EOF);
        else if(strcmp(argv[optind], "--") == NULL) {
            optind++;
            return(EOF);
        }
    optopt = c = argv[optind][sp];
    if(c == ':' || (cp=strchr(opts, c)) == NULL) {
        ERR(": illegal option -- ", c);
        if(argv[optind][++sp] == '\0') {
            optind++;
            sp = 1;
        }
        return('?');
    }
    if(*++cp == ':') {
        if(argv[optind][sp+1] != '\0')
            optarg = &argv[optind++][sp+1];
        else if(++optind >= argc) {
            ERR(": option requires an argument -- ", c);
            sp = 1;
            return('?');
        } else
            optarg = argv[optind++];
        sp = 1;
    } else {
        if(argv[optind][++sp] == '\0') {
            sp = 1;
            optind++;
        }
        optarg = NULL;
    }
    return(c);
}

14.2 Miscellaneous

14.2.1 如何检测机器字节序

对整数的存储有两种方式: 小端法(最低有效字节在最低地址处),大端法(最高有效字节在最低地址处)。
假设一个int位于地址0x100处,它的十六进制表示为0x01234567,用小端法、大端法分别表示为:

小端法:
地址    对应的值
0x100  0x67
0x101  0x45
0x102  0x23
0x103  0x01

大端法:
地址    对应的值
0x100  0x01
0x101  0x23
0x102  0x45
0x103  0x67

如何检测机器的字节序?可以用下面程序:

int x = 1;
if(*(char *)&x == 1)
    printf("little-endian\n");
else    printf("big-endian\n");

或者:

union {
    int i;
    char c[sizeof(int)];
} x;
x.i = 1;
if(x.c[0] == 1)
    printf("little-endian\n");
 else printf("big-endian\n");

参考:http://c-faq.com/misc/endiantest.html

15 C Tips

15.1 连用两个感叹号

连用两个感叹号,如!!a,其结果:当a为0时是0,当a不为0时是1。
分析:所有非0值都为真,所以!非0值 = 0,而!0 = 1。

15.2 malloc是否为线程安全

On any modern UNIX you'll get a thread-safe malloc by default. On Windows, use /MT, /MTd, /MD or /MDd flags to get thread-safe runtime library.

参考:
http://stackoverflow.com/questions/855763/is-malloc-thread-safe
http://www.360doc.com/content/12/0420/23/168576_205320609.shtml

15.3 不要用calloc来初始化NULL指针

不要用calloc来初始化NULL指针, C语言的标准并没有规定NULL指针一定是all bits zero(每字节都为0)。

参考:http://stackoverflow.com/questions/13251499/calloc-pointers-and-all-bits-zero

事实上,有些平台存在nonzero NULL,参见:http://c-faq.com/null/machexamp.html

15.4 多进程日志系统

需求:实现一个日志系统,可能多个进程同时往日志文件中写记录,应该保证记录不乱掉。

Q: 如何使日志记录不错乱?
A: 多进程记录日志,为了不乱掉,一定要用原子操作。要实现原子操作,可在打开文件时,指定为append方式。
参考:UNIX环境高级编程(第2版),3.11节

Q: 选择C库函数,还是系统调用?
A: 选择用系统调用,因为可以使用锁。无法用C库函数对文件加锁。

Q: 为什么要加锁?append方式打开文件,write时可以保证不乱掉,为什么还有加锁?
A: 加锁原因:我们总得设置日志文件的最大大小(否则可能超过文件系统限制而失败)。当文件达到达最大大小时,可以直接truncate为0,接着写;或者复制到备份文件后,truncate为0,接着写。
不管怎么处理都必须加锁。

Q: 怎么加锁?
A: 有fcntl,flock,lockf等,如果用fcntl,可以这样。

/* Add "advisory lock" to entire file. */
int lock_file(int fd)
{
    struct flock fl;
    fl.l_type = F_WRLCK;
    fl.l_start = 0;
    fl.l_whence = SEEK_SET;
    fl.l_len = 0;
    return (fcntl(fd, F_SETLKW, &fl));
}

/* Release "advisory lock" */
int unlock_file(int fd)
{
    struct flock fl;
    fl.l_type = F_UNLCK;        //unlock
    fl.l_start = 0;
    fl.l_whence = SEEK_SET;
    fl.l_len = 0;
    return (fcntl(fd, F_SETLKW, &fl));
}

参考:
fcntl加锁:http://hi.baidu.com/mgqw864/item/de1e620f419a9edfdde5b091
几种加锁的比较:http://archive.cert.uni-stuttgart.de/isn/2003/06/msg00070.html


Author: cig01

Created: <2010-07-02 Fri 00:00>

Last updated: <2017-12-19 Tue 15:04>

Creator: Emacs 25.3.1 (Org mode 9.1.4)