My Blog: python - scikit-learn cross_validation need more info about the resulting score -

Sunday, 15 January 2012

python - scikit-learn cross_validation need more info about the resulting score -

i'm attempting generate "which engine works best" info project i'm on. general thoughts simple, pick engine, cross validation, generate list of cross validation results, 1 that's biggest "best." tests done on same set of teaching data. here's snippet of idea. set loop , instead of setting simple_clf svm.svc() have loop of engines , rest of code each engine. base of operations info in featurevecs , scorenums contains corresponding score value, 0 9, particular base of operations info item supposed generate.

    x_train, x_test, y_train, y_test = train_test_split(         featurevecs, scorenums, test_size = 0.333, random_state = 0 )     # in loop of engine types i'm making sure basic code works     simple_clf = svm.svc()     simple_clf = grid_search.gridsearchcv( simple_clf, clfparams, cv = 3 )     simple_clf.fit( x_train, y_train )     kf = cross_validation.kfold( len( x_train ), k = 5 )     scores = cross_validation.cross_val_score( simple_clf, x_test,                                                 y_test, cv = kf )     print scores.mean(), scores.std() / 2     # loop end here

my problem scores isn't usable i'm supposed provide in terms of saying what's "best." scores can provide .mean() , .std() me print. don't want results of engine returning exact match, "close" match. in case, close means numeric score within 1 of expected score. if expected score 3, either 2, 3 or 4 considered match , result.

i looked through documentation , seems latest bleeding border version of scikit-learn has add-on metrics bundle allows custom score function passed grid search i'm unsure if plenty need. because i'd need able pass cross_val_score function not grid_search, no? regardless isn't option, i'm locked version of scikit-learn have use.

i noted reference cross_val_predict in latest bleeding border version seems need, 1 time again i'm locked version use.

what done before bleeding border when definition of "good" cross_validation wasn't exact match default used? certainly done. need pointed in right direction.

i'm stuck @ version 0.11 of scikit-learn because of corporate policy, can utilize approved software , version approved awhile ago alternative me.

here's changed things to, using helpful hint @ cross_val_score in 0.11 docs , find can custom score function , can write own long matches parameters. code have now. i'm looking for, generating results based not on exact match when "close" close defined within 1.

# kludge way of changing testing match close score_count = 0 score_crossover_count = 0  def my_custom_score_function( y_true, y_pred ):     # kludge way of changing testing match close     global score_count, score_crossover_count     if( score_count < score_crossover_count ):         close_applies = false     else:         close_applies = true     score_count += 1     print( close_applies, score_crossover_count, score_count )      deltas = np.abs( y_true - y_pred )     = 0     delta in deltas:         if( delta == 0 ):             += 1         elif( close_applies , ( delta == 1 ) ):             += 1       reply = float( ) / float( len( y_true ) )      homecoming  reply

code snippet main routine:

        fold_count = 5         # kludge way of changing testing match close         # set global variables custom scorer function         global score_count, score_crossover_count         score_count = 0         score_crossover_count = fold_count          # simple cross validation         simple_clf = svm.svc()         simple_clf = grid_search.gridsearchcv( simple_clf, clfparams, cv = 3 )         simple_clf.fit( x_train, y_train )         print( '{0} '.format( test_type ), end = "" )         kf = cross_validation.kfold( len( x_train ), k = fold_count )         scores = cross_validation.cross_val_score( simple_clf, x_train, y_train,                                                    cv = kf,                                                    score_func = my_custom_score_function )         print( 'accuracy (+/- 0) {1:0.4f} (+/- {2:0.4f}) '.format( scores, scores.mean(),                                                                    scores.std() / 2 ),                                                                     end = "" )         scores = cross_validation.cross_val_score( simple_clf, x_train, y_train,                                                    cv = kf,                                                    score_func = my_custom_score_function )         print( 'accuracy (+/- 1) {1:0.4f} (+/- {2:0.4f}) '.format( scores, scores.mean(),                                                                    scores.std() / 2 ),                                                                     end = "" )          print( "" )

you can find documentation cross_val_score 0.11 here can provide custom store function score_func argument, interface different. aside: why "locked into" current version? backward compatible 2 releases usually.

python scikit-learn

My Blog

Sunday, 15 January 2012

python - scikit-learn cross_validation need more info about the resulting score -

No comments:

Post a Comment