Tuesday, 15 January 2013

python - how to split a dataset into training and validation set keeping ratio between classes? -


I have a multi-classification problem and my dataset is slant, I have 100 examples of a particular class and some of them 10 different classes, so I want to divide the ratio of maintaining my datasets between classes, if I have 100 examples of a particular class and I have to record 30% of the training, then I want my 100 records 30 examples of representation squares and mer Like 10 represent three examples classes of records and so on.

You can use scalarse from online docs:

Stratified K-folds cross-resolution iterator

Train / test indicator provides information for splitting data into test tests.

This is a variation of the cross-audition object's fault, which returns the returned values. Silvets are formed by preserving the percentage of samples of each class.

  & gt; & Gt; & Gt; Cross_validation from sklearn import & gt; & Gt; & Gt; X = NP Array ([[1, 2], [3, 4], [1, 2], [3, 4]]) gt; & Gt; & Gt; Y = np.array ([0, 0, 1, 1]) & gt; & Gt; & Gt; Skf = cross_validation.StratifiedKFold (y, n_folds = 2) & gt; & Gt; & Gt; Lane (SkaF) 2> gt; & Gt; & Gt; Print (skf) sklearn.cross_validation.StratifiedKFold (label = [0 0 1 1], n_folds = 2, shuffle = false, random_state = none)> gt; & Gt; & Gt; For train_index, in test_index skf: ... print ("train:", train_index, "test:", test_index) ... X_train, X_test = X [train_index], x [test_index] ... y_train, y_test = Y [Train_index], y [test_index] Train: [1 3] Exam: [0 2] Train: [0] Exam: [1 3]  

This is the ratio of your class to Will preserve that the division maintains the ratio of the square, this pandus will work fine with DFS.

As suggested by @eli_m, you can use which accepts a different ratios paragraph:

sss = StratifiedShuffleSplit (y, 3, test_size = 0.7, random_state = 0)

will produce a 70% segmentation.


No comments:

Post a Comment