Friday, 15 March 2013

matrix - How can I speed up an autoencoder to use on text data written in python's theano package? -



matrix - How can I speed up an autoencoder to use on text data written in python's theano package? -

i'm new theano , i'm trying adapt autoencoder script here work on text data. code uses mnist dataset training data. info in form of numpy 2d array.

my info csr sparse matrix of 100,000 instances 50,000 features. matrix result of using sklearn's tfidfvectorizer fit , transform text data. i'm using sparse matrices modify code utilize theano.sparse bundle represent input. training set symbolic variable:

train_set_x = theano.sparse.shared(train_set)

however, theano.sparse matrices cannot perform of operations used in original script (there list of sparse operations here). code uses dot , sum tensor methods on input. have changed dot sparse.dot can't find out replace sum converting training batches dense matrices , using original tensor methods shown in cost function:

def get_cost(self): tilde_x = self.get_corrupted_input(self.x, self.corruption) y = self.get_hidden_values(tilde_x) z = self.get_reconstructed_input(y) #make dense, must improve way l = - t.sum(sp.dense_from_sparse(self.x) * t.log(z) + (1 - sp.dense_from_sparse(self.x)) * t.log(1 - z), axis=1) cost = t.mean(l) homecoming cost def get_hidden_values(self, input): # utilize theano.sparse.dot instead of t.dot homecoming t.nnet.sigmoid(theano.sparse.dot(input, self.w) + self.b)

the get_corrupted_input , get_reconstructed_input methods remain in link above. question is there faster way this?

converting matrices dense making running training slow. takes 20.67m 1 training epoch batch size of 20 training instances.

any help or tips give appreciated!

in recent master branch of theano.sparse there sp_sum method listed.

(see here)

if you're not using bleeding border version i'd install , see if calling work , if doing speeds things up:

pip install --upgrade --no-deps git+git://github.com/theano/theano.git

(and if does, noting here nice, it's not clear sparse functionality much faster using dense calculations way through, on gpu.)

python matrix sparse-matrix theano autoencoder

No comments:

Post a Comment