R

Table of Contents

1. R 简介

R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs.

参考:
Official website: https://www.r-project.org/
R Tutorial(本文很多内容摘自于此): http://www.tutorialspoint.com/r/index.htm
The R Manuals: https://cran.r-project.org/manuals.html
An Introduction to R: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
R Data Import/Export: https://cran.r-project.org/doc/manuals/r-release/R-data.html
R Language Definition: https://cran.r-project.org/doc/manuals/r-release/R-lang.html

2. R 基本使用

安装完 R 软件包,即可使用。如:

$ R                      # 启动R的交互环境
> print("hello world")
[1] "hello world"
> 1+2
[1] 3
> 5*7
[1] 35
> q()                    # 输入 q() 可退出R

R 程序语言对大小写敏感。源码注释以#号开头。

2.1. 获取帮忙文档

要获取某个函数的帮忙文档,可以使用下面的方法。
方法一:

> ?typeof

方法二:

> help(typeof)

2.2. 工具 Rscript

Rscript 可以执行保存在文件中的 R 程序。如:

$ cat helloworld.R
str1 <- "Hello, World!"
print (str1)
$ Rscript  helloworld.R
[1] "Hello, World!"

3. 基本数据类型

3.1. Vectors(最基本的类型)

The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors.

Table 1: R 语言中的 6 种 Vector 类型
typeof 返回结果 mode 返回结果 class 返回结果 实例
logical logical logical TRUE, FALSE
integer numeric integer 0L, 2L, 5L
double numeric numeric 2.8, 100
complex complex complex 1 + 2i
character character character "a", 'abc'
raw raw raw charToRaw("Hello")

说明:R中 character 类型就是 string。

3.1.1. 创建多元素的 Vector (c())

要创建多于一个元素的 Vector,需要使用函数 c() 。如:

> v <- 8L
> print(v)
[1] 8
> v2 <- c(1L, 2L, 5L)
> print(v2)
[1] 1 2 5

3.1.2. 创建多元素的数字类型 Vector (:)

可以用 : 创建多元素的数字类型 Vector。如:

# Creating a sequence from 5 to 13.
v <- 5:13
print(v)

# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)

# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)

上面程序会得到下面输出:

[1]  5  6  7  8  9 10 11 12 13
[1]  6.6  7.6  8.6  9.6 10.6 11.6 12.6
[1]  3.8  4.8  5.8  6.8  7.8  8.8  9.8 10.8

3.1.3. 创建多元素的 Vector (seq)

> seq(5, 9, by=0.4)
 [1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

3.1.4. 访问 Vector 中的元素 ([])

Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing. Indexing starts with position 1.

实例:

$ cat access_elm.R
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")

# Accessing vector elements using position.
u <- t[c(2,3,6)]
print(u)

# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)

# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)
$ Rscript  access_elm.R
[1] "Mon" "Tue" "Fri"
[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"

3.1.5. 排序 Vector 元素 (sort)

Elements in a vector can be sorted using sort() function.

$ cat sort.R
v <- c(3,8,4,5,0,11, -9, 304)
print(sort(v))
print(sort(v, decreasing = TRUE))
$ Rscript sort.R
[1]  -9   0   3   4   5   8  11 304
[1] 304  11   8   5   4   3   0  -9

3.1.6. 多个 Vector 的加减乘除

直接看实例:

$ cat arithmetic.R
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)

print(v1 + v2)
print(v1 - v2)
print(v1 * v2)
print(v1 / v2)

$ Rscript arithmetic.R
[1]  7 19  4 13  1 13
[1] -1 -3  4 -3 -1  9
[1] 12 88  0 40  0 22
[1] 0.7500000 0.7272727       Inf 0.6250000 0.0000000 5.5000000

如果两个进行操作的 Vector 的长度不一样,短的那个 Vector 会自动以“循环方式扩展”。如:

$ cat arithmetic2.R
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)          # v2没有v1长,v2会自动扩展为:c(4,11,4,11,4,11)

print(v1 + v2)
print(v1 - v2)

$ Rscript arithmetic2.R
[1]  7 19  8 16  4 22
[1] -1 -3  0 -6 -4  0

3.1.7. string (单引号或双引号都行)

Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string with in double quotes even when you create them with single quote.

R 语言中,string 可以用单引号或者双引号包围。内部实现是用双引号,所以单引号包围的 string 在输出时会变为双引号。

string 实例:

> a <- 'Start and end with single quote'
> print(a)
[1] "Start and end with single quote"
>
> b <- "Start and end with double quotes"
> print(b)
[1] "Start and end with double quotes"
>
> c <- "single quote ' in between double quotes"
> print(c)
[1] "single quote ' in between double quotes"
>
> d <- 'Double quotes " in between single quote'
> print(d)
[1] "Double quotes \" in between single quote"
3.1.7.1. 字符串操作基本函数

paste() 可连接两个字符串。如:

> a <- "hello"
> b <- "world"
> c <- "R"
> paste(a,b,c)
[1] "hello world R"
> paste(a,b,c, sep="-")
[1] "hello-world-R"

要得到 paste 函数的完整说明,可以在 R 环境中执行: ?pastehelp(paste)

函数名 说明 实例
nchar 求字符数目 nchar('abc') 会得到 3
paste 连接字符串 paste('ab','c') 会得到"ab c"
substring 求子串(比 substr 通用) substring("Extract", 5) 会得到"act"
substr 求子串 substr("Extract", 5, 7) 会得到"act"
toupper/tolower 大小写转换 toupper('abc') 会得到"ABC"
sub 字符串转换 sub("a", "A", "abcabc") 会得到"Abcabc"
gsub 字符串转换 gsub("a", "A", "abcabc") 会得到"AbcAbc"
chartr 字符串转换 chartr("iXs", "why", "MiXeD cAsE 123") 会得到"MwheD cAyE 123"
format 格式化字符串 format("Hello",width = 8, justify = "l") 会得到"Hello "
strsplit 切割字符串 strsplit("abcabc", "b") 会得到"a" "ca" "c"

3.2. Lists(和 Vector 相似,但元素类型可以不同)

A list is a R-object which can contain many different types of elements inside it.

用函数 list() 可以创建 Lists。如:

> list1 <- list(c(2,5,3), 21.3, sin)    # sin是函数
> print(list1)
[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x)  .Primitive("sin")

3.2.1. 访问 List 中的元素 ([])

[] 可访问 List 中的元素,下标从 1(而不是 0)开始。如:

> list1 <- list("Jan", "Feb", "Mar")
> list1[2]
[[1]]
[1] "Feb"

> list1[2] <- "XYZ"                     # 修改第2个元素
> list1
[[1]]
[1] "Jan"

[[2]]
[1] "XYZ"

[[3]]
[1] "Mar"

3.2.2. 给 List 的元素命名 (names)

可以用函数 names 给 List 中元素命名。
如:

$ cat list_names.R
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow=2), list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")        # 用函数 names 给元素命名

# Show the list.
print("Show the list")
print(list_data)

print("Show elements `1st Quarter`")
print(list_data$`1st Quarter`)           # 以名字访问List中元素

print("Show element `A_Matrix`")
print(list_data$`A_Matrix`)              # 以名字访问List中元素,可省写为 print(list_data$A_Matrix)

$ Rscript list_names.R
[1] "Show the list"
$`1st Quarter`
[1] "Jan" "Feb" "Mar"

$A_Matrix
     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

$`A Inner list`
$`A Inner list`[[1]]
[1] "green"

$`A Inner list`[[2]]
[1] 12.3


[1] "Show elements `1st Quarter`"
[1] "Jan" "Feb" "Mar"
[1] "Show element `A_Matrix`"
     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

3.2.3. 合并 Lists (c)

c() 可以合并 List,如:

> list1 <- list("Sun","Mon")
> list2 <- list(1, 2)
> c(list1, list2)               # Merge two lists.
[[1]]
[1] "Sun"

[[2]]
[1] "Mon"

[[3]]
[1] 1

[[4]]
[1] 2

3.2.4. 转换 Lists 为 Vectors

用函数 unlist 可以转换 Lists 为 Vectors。如:

> list1 <- list(1:5)
> list2 <-list(10:14)
> list1 + list2             # 无法直接在List上使用加法运算
Error in list1 + list2 : non-numeric argument to binary operator
> mapply("+", list1, list2)
     [,1]
[1,]   11
[2,]   13
[3,]   15
[4,]   17
[5,]   19
> v1 <- unlist(list1)       # 用unlist把List转换为Vector
> v2 <- unlist(list2)
> v1 + v2
[1] 11 13 15 17 19

4. 控制结构

5. Tips

5.1. 类型检测 typeofmode

typeof() gives the "type" of object from R's point of view, whilst mode() gives the "type" of object from the point of view of Becker, Chambers & Wilks (1988). The latter may be more compatible with other S implementations according to the R Language Definition manual.

> x <- 1:4
> typeof(x)
[1] "integer"
> mode(x)
[1] "numeric"

总结: typeof()mode() 都能得到一个对象的类型, mode() 返回的结果和 S 语言的兼容性更好。

参考:
http://stats.stackexchange.com/questions/3212/mode-class-and-type-of-r-objects
http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html#Objects

5.2. 赋值操作符 <-=

R 语言中有好几个赋值操作符。如下面任意语句都可以将 value 赋值给 x。

x <- value
x <<- value
value -> x
value ->> x
x = value

The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

一般地, <- 的使用更为常见。
各赋值语句的不同请参考:https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html

Author: cig01

Created: <2015-12-05 Sat>

Last updated: <2015-12-12 Sat>

Creator: Emacs 27.1 (Org mode 9.4)