r - predict.randomForest() returns empty rows beneath correct data with type="prob" selected -
here dummy code relating iris
dataset, produces problem i'm having.
iris <- read.csv("~/rdata/iris.csv") library(randomforest) fit <- randomforest(species ~ ., data=iris) iris$guess <- predict(fit, type="prob") view(iris)
you see new guess
column 450 records long, while iris
150 records long. predictions seem correct, , problem goes away if remove type="prob"
code.
the explanation type
argument @ ?predict.randomforest
indicates type='prob'
, you'll receive matrix of predicted probabilities different potential response classes.
you'll see when @ predict
outcome itself:
head(predict(fit, type="prob"), 10) # setosa versicolor virginica # 1 1.0000000 0.000000000 0 # 2 1.0000000 0.000000000 0 # 3 1.0000000 0.000000000 0 # 4 1.0000000 0.000000000 0 # 5 1.0000000 0.000000000 0 # 6 1.0000000 0.000000000 0 # 7 1.0000000 0.000000000 0 # 8 1.0000000 0.000000000 0 # 9 0.9945355 0.005464481 0 # 10 1.0000000 0.000000000 0
since there 3 classes (species) , 150 observations, there 450 predictions. when matrix of predictions data.frame, r removes dimensions , adds single long column.
if maintain type
default 'response'
, r homecoming class has highest predicted probability. example, compare:
levels(iris$species)[apply(predict(fit,type="prob"), 1, which.max)]
with
predict(fit)
r statistics random-forest
No comments:
Post a Comment