parallel processing - speed up nested loops in R to create an adjacency matrix -
i need help parallelizing or speeding next nested loop:
have list of vertices (identified id number), , each id has string of numbers associated (the strings of finite length, typically 60-200).id list of approx. 10,000 distinct ids
seq list of (different length) sequences, each sequence associated unique id
i want build adjacency matrix graph on these vertices vertex i , vertex j connected if sequences have mutual elements. here code trying improve:
id_matrix<-matrix(nrow=length(id),ncol=length(id)) (i in 1:(length(id)){ (j in 1:(length(id)){ edgelist[i,j]=length(intersect(seq[i],seq[j]) } } (this produce 0 non-overlapping id sequences, , finite number whenever there overlap, used weights edges, , normalized).
i tried options foreach, dopar, etc., have not been successful. running length(id)=100 takes more 2 minutes! total run take @ to the lowest degree month! working on windows pc r studio version 0.98.507.
any help much appreciated, on parallelizing these 2 nested loops in r.
note: sparse matrix: 1% of possible 10^8 edges occur.
thank help!
on computer
id_matrix <- matrix(0, nrow = length(id), ncol = length(id)) (i in 1:length(id)) { (j in i:length(id)) { id_matrix[cbind(c(i,j),c(j, i))] <- length(intersect(seq[[i]],seq[[j]])) } } takes less half hr when length(id) 10000.
r parallel-processing adjacency-matrix
No comments:
Post a Comment