介紹 subset(),subset 中傳入條件及要選擇的 column。
> name <- c("A", "B", "C", "D", "E")
> age <- c(21, 19, 24, 21, 22)
> gender <- c("M", "F", "M", "M", "F")
> room <- c("201", "511", "210", "211", "508")
> df <- data.frame(name, age, gender, room)
> df
name age gender room
1 A 21 M 201
2 B 19 F 511
3 C 24 M 210
4 D 21 M 211
5 E 22 F 508
> newDf <- subset(df, age > 20 & gender == "M", select = name:gender)
> newDf
name age gender
1 A 21 M
3 C 24 M
4 D 21 M
隨機取樣,可利用 sample,sample 中第一個傳入欲選擇的 row,第二為要取樣的數目,第三為同一 row 是否可以重複選取。若不能重複選取,但是取樣的數目又大於 row 數,就會報錯。> df
name age gender room
1 A 21 M 201
2 B 19 F 511
3 C 24 M 210
4 D 21 M 211
5 E 22 F 508
> df[sample(1:nrow(df), 2, replace=FALSE), ]
name age gender room
4 D 21 M 211
3 C 24 M 210
> df[sample(1:nrow(df), 2, replace=TRUE), ]
name age gender room
3 C 24 M 210
2 B 19 F 511
> df[sample(1:nrow(df), 10, replace=TRUE), ]
name age gender room
5 E 22 F 508
5.1 E 22 F 508
5.2 E 22 F 508
5.3 E 22 F 508
2 B 19 F 511
1 A 21 M 201
1.1 A 21 M 201
1.2 A 21 M 201
1.3 A 21 M 201
5.4 E 22 F 508
> df[sample(1:nrow(df), 10, replace=FALSE), ]
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
沒有留言:
張貼留言