python - how to construct training vectors of word n-gram using TF-IDF -
my task text classification svm, using word n-gram features. before using tf-idf, code is:
word_dic = ngram.wordngrams(text, n) freq_term_vector = [word_dic[gram] if gram in word_dic else 0 gram in global_vector] x.append(freq_term_vector)
and works well. however, when tried tf-idf, code below:
freq_term_vector = [word_dic[gram] if gram in word_dic else 0 gram in global_vector] tfidf = tfidftransformer(norm="l2") tfidf.fit(freq_term_vector) x.append(tfidf.transform(freq_term_vector).toarray())
the training part can done, when programme ran predict part, said
clf.predict(x_test) file "/usr/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 223, in predict scores = self.decision_function(x) file "/usr/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 207, in decision_function dense_output=true) + self.intercept_ file "/usr/lib/python2.7/dist-packages/sklearn/utils/extmath.py", line 83, in safe_sparse_dot homecoming np.dot(a, b) valueerror: shapes (1100,1,38) , (1,11) not aligned: 38 (dim 2) != 1 (dim 0)
the training method , predict method same. how can solve align problem? help me check code above or give me idea?
i think problem append, seek following:
... x = tfidf.transform(freq_term_vector) ... x_test = tfidf.transform(freq_term_vector_test) clf.predict(x_test)
python nlp svm tf-idf
No comments:
Post a Comment