Machine Learning - dividing data into test and training suites

Question

Machine Learning - dividing data into test and training suites

How to split a given dataset into training and test suites along with their correct labels.

There is a similar implementation via the sklearn library:

from sklearn.cross_validation import train_test_split

train, test = train_test_split(df, test_size = 0.2)

where df is the original dataset ... for example a list of strings

The problem is it doesn't accept target / label along with datasets. Therefore, we cannot track which label belongs to that data point ...

Is there a way to bind the data points and their labels and then split the datasets into a train and test?

+3

scikit-learn machine-learning

mach 24 Sep 15 at 5:57

source to share

1 answer

Ami tavory · Accepted Answer · 2015-09-24T06:44:38+0000

sklearn.cross_validation.train_test_split

essentially accepts a variable number of arrays to be split into

* arrays : a sequence of arrays or scipy.sparse matrices with the same shape [0]

Returns :
split: list of arrays, length = 2 * len (arrays) A list containing the split of the train into an input array.

so you can just add a list of labels:

from sklearn import cross_validation

df = ['the', 'quick', 'brown', 'fox']
labels = [0, 1, 0, 0]

>> cross_validation.train_test_split(df, labels, test_size=0.2)
[['quick', 'fox', 'the'], ['brown'], [1, 0, 0], [0]]

Machine Learning - dividing data into test and training suites

More articles: