# Bagging and Random Forest

## 1 Bagging和RF简介

Bagging和Random Forest都是集成学习(ensemble learning)方法，且Random Forest是Bagging的扩展变体。

## 2 Bagging (Bootstrap AGGregatING)

Bagging(Bootstrap AGGregatING)是一种集成学习方法，由 Leo Breiman 于1994年提出。

Bagging方法的基本流程：设训练数据集有 $$N$$ 样本，从训练数据集中用Bootstrap采样方式采样出 $$T$$ 个采样集，然后基于每一个采样集训练出一个基本学习器，再将这些基本学习器进行结合。如何结合呢？Bagging通常对分类任务使用简单投票法（即获得票数最多的类别作为最终的类别），对回归任务使用简单平均法（即把 $$T$$ 个基本学习器结果的平均值作为最终的输出）。

### 2.1 Bagging优点

(2) Bagging的每个基本学习器使用的训练数据不同，且只包含原始训练数据的部分内容（因为Bootstrap采样是“有放回采样”，这样有样本会出现多次，有样本不会出现），这使得， Bagging不容易出现过拟合现象。

## 3 Random Forest

(1) 随机森林采用“决策树”为基本学习器；Bagging是一个通用框架，可以使用任何算法为基本学习器。
(2) 随机森林在决策树的训练过程中引入了“随机属性选择”。

“随机属性选择”描述如下：

## 4 Random Forest vs. Gradient-Boosted Tree

Both Gradient-Boosted Trees (GBTs) and Random Forests are algorithms for learning ensembles of trees, but the training processes are different. There are several practical trade-offs:
(1) GBTs train one tree at a time, so they can take longer to train than random forests. Random Forests can train multiple trees in parallel.
(2) Random Forests can be less prone to overfitting. Training more trees in a Random Forest reduces the likelihood of overfitting, but training more trees with GBTs increases the likelihood of overfitting. (In statistical language, Random Forests reduce variance by using more trees, whereas GBTs reduce bias by using more trees.)
(3) Random Forests can be easier to tune since performance improves monotonically with the number of trees (whereas performance can start to decrease for GBTs if the number of trees grows too large).
In short, both algorithms can be effective, and the choice should be based on the particular dataset.