r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -
here dummy code relating iris dataset, produces problem i'm having.
iris <- read.csv("~/rdata/iris.csv") library(randomforest) fit <- randomforest(species ~ ., data=iris) iris$guess <- predict(fit, type="prob") view(iris) you see new guess column 450 records long, while iris 150 records long. predictions seem correct, , problem goes away if remove type="prob" code.
the explanation type argument @ ?predict.randomforest indicates type='prob', you'll receive matrix of predicted probabilities different potential response classes.
you'll see when @ predict outcome itself:
head(predict(fit, type="prob"), 10) # setosa versicolor virginica # 1 1.0000000 0.000000000 0 # 2 1.0000000 0.000000000 0 # 3 1.0000000 0.000000000 0 # 4 1.0000000 0.000000000 0 # 5 1.0000000 0.000000000 0 # 6 1.0000000 0.000000000 0 # 7 1.0000000 0.000000000 0 # 8 1.0000000 0.000000000 0 # 9 0.9945355 0.005464481 0 # 10 1.0000000 0.000000000 0 since there 3 classes (species) , 150 observations, there 450 predictions. when matrix of predictions data.frame, r removes dimensions , adds single long column.
if maintain type default 'response', r homecoming class has highest predicted probability. example, compare:
levels(iris$species)[apply(predict(fit,type="prob"), 1, which.max)] with
predict(fit) r statistics random-forest
No comments:
Post a Comment