Sunday, 15 March 2015

python - Linear Regression with sklearn using categorical variables -



python - Linear Regression with sklearn using categorical variables -

i trying run usual linear regression in python using sk-learn, have categorical info don't know how handle, because imported info using pandas read.csv() , have learned previous experiences , reading pandas , sk-learn don't along quite (yet).

my info looks this:

salary atbat hits league eastdivision 475 315 81 1 0 480 479 130 0 0 500 496 141 1 1

i wanna predict salary using atbat, hits, league , eastdivision, league , eastdivision categorical.

if import info via numpy's loadtext() numpy array in theory utilize sklearn, when utilize dictvectorizer error. code is:

import numpy np sklearn.feature_extraction import dictvectorizer dv nphitters=np.loadtxt('hitters.csv',delimiter=',', skiprows=1) vec = dv( sparse = false ) catl=vec.fit_transform(nphitters[:,3:4])

and error when run lastly line catl=vec.fit_transform(nphitters[:,3:4]), error

traceback (most recent phone call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.7/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 142, in fit_transform self.fit(x) file "/usr/lib/python2.7/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 107, in fit f, v in six.iteritems(x): file "/usr/lib/python2.7/dist-packages/sklearn/externals/six.py", line 268, in iteritems homecoming iter(getattr(d, _iteritems)()) attributeerror: 'numpy.ndarray' object has no attribute 'iteritems'

i don't know how prepare it, , thing is, 1 time categorical info working, how run regression? if categorical variable numeric variable?

i have found several questions similar mine, none of them have worked me.

basically happens passing vector of 1 , 0 function take keys , values (like dictionary) , create table you

d = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]

will become

array([[ 2., 0., 1.], [ 0., 1., 3.]])

or

|bar|baz|foo |<br> |---|---|-----|<br> | 2 | 0 | 1 |<br> | 0 | 0 | 3 |<br>

read: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.dictvectorizer.html

in case, info ready linear regression features league , east partition dummies already.

python scikit-learn linear-regression categorical-data

No comments:

Post a Comment