R

Table of Contents

1 R简介

R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs.

参考:
Official website: https://www.r-project.org/
R Tutorial(本文很多内容摘自于此): http://www.tutorialspoint.com/r/index.htm
The R Manuals: https://cran.r-project.org/manuals.html
An Introduction to R: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
R Data Import/Export: https://cran.r-project.org/doc/manuals/r-release/R-data.html
R Language Definition: https://cran.r-project.org/doc/manuals/r-release/R-lang.html

2 R基本使用

安装完R软件包,即可使用。如:

$ R                      # 启动R的交互环境
> print("hello world")
[1] "hello world"
> 1+2
[1] 3
> 5*7
[1] 35
> q()                    # 输入 q() 可退出R

R程序语言对大小写敏感。源码注释以#号开头。

2.1 获取帮忙文档

要获取某个函数的帮忙文档,可以使用下面的方法。
方法一:

> ?typeof

方法二:

> help(typeof)

2.2 工具 Rscript

Rscript 可以执行保存在文件中的R程序。如:

$ cat helloworld.R
str1 <- "Hello, World!"
print (str1)
$ Rscript  helloworld.R
[1] "Hello, World!"

3 基本数据类型

3.1 Vectors(最基本的类型)

The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors.

Table 1: R语言中的6种Vector类型
typeof返回结果 mode返回结果 class返回结果 实例
logical logical logical TRUE, FALSE
integer numeric integer 0L, 2L, 5L
double numeric numeric 2.8, 100
complex complex complex 1 + 2i
character character character "a", 'abc'
raw raw raw charToRaw("Hello")

说明:R中character类型就是string。

3.1.1 创建多元素的Vector (c())

要创建多于一个元素的Vector,需要使用函数 c() 。如:

> v <- 8L
> print(v)
[1] 8
> v2 <- c(1L, 2L, 5L)
> print(v2)
[1] 1 2 5

3.1.2 创建多元素的数字类型Vector (:)

可以用 : 创建多元素的数字类型Vector。如:

# Creating a sequence from 5 to 13.
v <- 5:13
print(v)

# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)

# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)

上面程序会得到下面输出:

[1]  5  6  7  8  9 10 11 12 13
[1]  6.6  7.6  8.6  9.6 10.6 11.6 12.6
[1]  3.8  4.8  5.8  6.8  7.8  8.8  9.8 10.8

3.1.3 创建多元素的Vector (seq)

> seq(5, 9, by=0.4)
 [1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

3.1.4 访问Vector中的元素 ([])

Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing. Indexing starts with position 1.

实例:

$ cat access_elm.R
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")

# Accessing vector elements using position.
u <- t[c(2,3,6)]
print(u)

# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)

# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)
$ Rscript  access_elm.R
[1] "Mon" "Tue" "Fri"
[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"

3.1.5 排序Vector元素 (sort)

Elements in a vector can be sorted using sort() function.

$ cat sort.R
v <- c(3,8,4,5,0,11, -9, 304)
print(sort(v))
print(sort(v, decreasing = TRUE))
$ Rscript sort.R
[1]  -9   0   3   4   5   8  11 304
[1] 304  11   8   5   4   3   0  -9

3.1.6 多个Vector的加减乘除

直接看实例:

$ cat arithmetic.R
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)

print(v1 + v2)
print(v1 - v2)
print(v1 * v2)
print(v1 / v2)

$ Rscript arithmetic.R
[1]  7 19  4 13  1 13
[1] -1 -3  4 -3 -1  9
[1] 12 88  0 40  0 22
[1] 0.7500000 0.7272727       Inf 0.6250000 0.0000000 5.5000000

如果两个进行操作的Vector的长度不一样,短的那个Vector会自动以“循环方式扩展”。如:

$ cat arithmetic2.R
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)          # v2没有v1长,v2会自动扩展为:c(4,11,4,11,4,11)

print(v1 + v2)
print(v1 - v2)

$ Rscript arithmetic2.R
[1]  7 19  8 16  4 22
[1] -1 -3  0 -6 -4  0

3.1.7 string (单引号或双引号都行)

Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string with in double quotes even when you create them with single quote.

R语言中,string可以用单引号或者双引号包围。内部实现是用双引号,所以单引号包围的string在输出时会变为双引号。

string实例:

> a <- 'Start and end with single quote'
> print(a)
[1] "Start and end with single quote"
>
> b <- "Start and end with double quotes"
> print(b)
[1] "Start and end with double quotes"
>
> c <- "single quote ' in between double quotes"
> print(c)
[1] "single quote ' in between double quotes"
>
> d <- 'Double quotes " in between single quote'
> print(d)
[1] "Double quotes \" in between single quote"
3.1.7.1 字符串操作基本函数

paste() 可连接两个字符串。如:

> a <- "hello"
> b <- "world"
> c <- "R"
> paste(a,b,c)
[1] "hello world R"
> paste(a,b,c, sep="-")
[1] "hello-world-R"

要得到paste函数的完整说明,可以在R环境中执行: ?pastehelp(paste)

函数名 说明 实例
nchar 求字符数目 nchar('abc') 会得到3
paste 连接字符串 paste('ab','c') 会得到"ab c"
substring 求子串(比substr通用) substring("Extract", 5) 会得到"act"
substr 求子串 substr("Extract", 5, 7) 会得到"act"
toupper/tolower 大小写转换 toupper('abc') 会得到"ABC"
sub 字符串转换 sub("a", "A", "abcabc") 会得到"Abcabc"
gsub 字符串转换 gsub("a", "A", "abcabc") 会得到"AbcAbc"
chartr 字符串转换 chartr("iXs", "why", "MiXeD cAsE 123") 会得到"MwheD cAyE 123"
format 格式化字符串 format("Hello",width = 8, justify = "l") 会得到"Hello "
strsplit 切割字符串 strsplit("abcabc", "b") 会得到"a" "ca" "c"

3.2 Lists(和Vector相似,但元素类型可以不同)

A list is a R-object which can contain many different types of elements inside it.

用函数 list() 可以创建Lists。如:

> list1 <- list(c(2,5,3), 21.3, sin)    # sin是函数
> print(list1)
[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x)  .Primitive("sin")

3.2.1 访问List中的元素 ([])

[] 可访问List中的元素,下标从1(而不是0)开始。如:

> list1 <- list("Jan", "Feb", "Mar")
> list1[2]
[[1]]
[1] "Feb"

> list1[2] <- "XYZ"                     # 修改第2个元素
> list1
[[1]]
[1] "Jan"

[[2]]
[1] "XYZ"

[[3]]
[1] "Mar"

3.2.2 给List的元素命名 (names)

可以用函数 names 给List中元素命名。
如:

$ cat list_names.R
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow=2), list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")        # 用函数 names 给元素命名

# Show the list.
print("Show the list")
print(list_data)

print("Show elements `1st Quarter`")
print(list_data$`1st Quarter`)           # 以名字访问List中元素

print("Show element `A_Matrix`")
print(list_data$`A_Matrix`)              # 以名字访问List中元素,可省写为 print(list_data$A_Matrix)

$ Rscript list_names.R
[1] "Show the list"
$`1st Quarter`
[1] "Jan" "Feb" "Mar"

$A_Matrix
     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

$`A Inner list`
$`A Inner list`[[1]]
[1] "green"

$`A Inner list`[[2]]
[1] 12.3


[1] "Show elements `1st Quarter`"
[1] "Jan" "Feb" "Mar"
[1] "Show element `A_Matrix`"
     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

3.2.3 合并Lists (c)

c() 可以合并List,如:

> list1 <- list("Sun","Mon")
> list2 <- list(1, 2)
> c(list1, list2)               # Merge two lists.
[[1]]
[1] "Sun"

[[2]]
[1] "Mon"

[[3]]
[1] 1

[[4]]
[1] 2

3.2.4 转换Lists为Vectors

用函数 unlist 可以转换Lists为Vectors。如:

> list1 <- list(1:5)
> list2 <-list(10:14)
> list1 + list2             # 无法直接在List上使用加法运算
Error in list1 + list2 : non-numeric argument to binary operator
> mapply("+", list1, list2)
     [,1]
[1,]   11
[2,]   13
[3,]   15
[4,]   17
[5,]   19
> v1 <- unlist(list1)       # 用unlist把List转换为Vector
> v2 <- unlist(list2)
> v1 + v2
[1] 11 13 15 17 19

4 控制结构

5 Tips

5.1 类型检测 typeofmode

typeof() gives the "type" of object from R's point of view, whilst mode() gives the "type" of object from the point of view of Becker, Chambers & Wilks (1988). The latter may be more compatible with other S implementations according to the R Language Definition manual.

> x <- 1:4
> typeof(x)
[1] "integer"
> mode(x)
[1] "numeric"

总结: typeof()mode() 都能得到一个对象的类型, mode() 返回的结果和S语言的兼容性更好。

参考:
http://stats.stackexchange.com/questions/3212/mode-class-and-type-of-r-objects
http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html#Objects

5.2 赋值操作符 <-=

R语言中有好几个赋值操作符。如下面任意语句都可以将value赋值给x。

x <- value
x <<- value
value -> x
value ->> x
x = value

The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

一般地, <- 的使用更为常见。
各赋值语句的不同请参考:https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html


Author: cig01

Created: <2015-12-05 Sat 00:00>

Last updated: <2015-12-12 Sat 23:54>

Creator: Emacs 25.1.1 (Org mode 9.0.7)