Wednesday, 15 August 2012

parallel processing - speed up nested loops in R to create an adjacency matrix -



parallel processing - speed up nested loops in R to create an adjacency matrix -

i need help parallelizing or speeding next nested loop:

have list of vertices (identified id number), , each id has string of numbers associated (the strings of finite length, typically 60-200).

id list of approx. 10,000 distinct ids

seq list of (different length) sequences, each sequence associated unique id

i want build adjacency matrix graph on these vertices vertex i , vertex j connected if sequences have mutual elements. here code trying improve:

id_matrix<-matrix(nrow=length(id),ncol=length(id)) (i in 1:(length(id)){ (j in 1:(length(id)){ edgelist[i,j]=length(intersect(seq[i],seq[j]) } }

(this produce 0 non-overlapping id sequences, , finite number whenever there overlap, used weights edges, , normalized).

i tried options foreach, dopar, etc., have not been successful. running length(id)=100 takes more 2 minutes! total run take @ to the lowest degree month! working on windows pc r studio version 0.98.507.

any help much appreciated, on parallelizing these 2 nested loops in r.

note: sparse matrix: 1% of possible 10^8 edges occur.

thank help!

on computer

id_matrix <- matrix(0, nrow = length(id), ncol = length(id)) (i in 1:length(id)) { (j in i:length(id)) { id_matrix[cbind(c(i,j),c(j, i))] <- length(intersect(seq[[i]],seq[[j]])) } }

takes less half hr when length(id) 10000.

r parallel-processing adjacency-matrix

No comments:

Post a Comment