Sunday, 15 September 2013

r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -



r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -

here dummy code relating iris dataset, produces problem i'm having.

iris <- read.csv("~/rdata/iris.csv") library(randomforest) fit <- randomforest(species ~ ., data=iris) iris$guess <- predict(fit, type="prob") view(iris)

you see new guess column 450 records long, while iris 150 records long. predictions seem correct, , problem goes away if remove type="prob" code.

the explanation type argument @ ?predict.randomforest indicates type='prob', you'll receive matrix of predicted probabilities different potential response classes.

you'll see when @ predict outcome itself:

head(predict(fit, type="prob"), 10) # setosa versicolor virginica # 1 1.0000000 0.000000000 0 # 2 1.0000000 0.000000000 0 # 3 1.0000000 0.000000000 0 # 4 1.0000000 0.000000000 0 # 5 1.0000000 0.000000000 0 # 6 1.0000000 0.000000000 0 # 7 1.0000000 0.000000000 0 # 8 1.0000000 0.000000000 0 # 9 0.9945355 0.005464481 0 # 10 1.0000000 0.000000000 0

since there 3 classes (species) , 150 observations, there 450 predictions. when matrix of predictions data.frame, r removes dimensions , adds single long column.

if maintain type default 'response', r homecoming class has highest predicted probability. example, compare:

levels(iris$species)[apply(predict(fit,type="prob"), 1, which.max)]

with

predict(fit)

r statistics random-forest

No comments:

Post a Comment