My Blog: r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -

Sunday, 15 September 2013

r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -

here dummy code relating iris dataset, produces problem i'm having.

iris <- read.csv("~/rdata/iris.csv") library(randomforest) fit <- randomforest(species ~ ., data=iris)  iris$guess <- predict(fit, type="prob") view(iris)

you see new guess column 450 records long, while iris 150 records long. predictions seem correct, , problem goes away if remove type="prob" code.

the explanation type argument @ ?predict.randomforest indicates type='prob', you'll receive matrix of predicted probabilities different potential response classes.

you'll see when @ predict outcome itself:

head(predict(fit, type="prob"), 10)  #       setosa  versicolor virginica # 1  1.0000000 0.000000000         0 # 2  1.0000000 0.000000000         0 # 3  1.0000000 0.000000000         0 # 4  1.0000000 0.000000000         0 # 5  1.0000000 0.000000000         0 # 6  1.0000000 0.000000000         0 # 7  1.0000000 0.000000000         0 # 8  1.0000000 0.000000000         0 # 9  0.9945355 0.005464481         0 # 10 1.0000000 0.000000000         0

since there 3 classes (species) , 150 observations, there 450 predictions. when matrix of predictions data.frame, r removes dimensions , adds single long column.

if maintain type default 'response', r homecoming class has highest predicted probability. example, compare:

levels(iris$species)[apply(predict(fit,type="prob"), 1, which.max)]

with

predict(fit)

r statistics random-forest

My Blog

Sunday, 15 September 2013

r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -

No comments:

Post a Comment