Building large XML trees in R

I am trying to create a large XML tree in R. Here is a simplified version of the code:

library(XML)
N = 100000#In practice is larger  10^8/ 10^9
seq = newXMLNode("sequence")
pars = as.character(1:N)
for(i in 1:N)
    newXMLNode("Parameter", parent=seq, attrs=c(id=pars[i]))

      

When N is about N ^ 6 it takes about a minute, N ^ 7 takes about forty minutes. Is there a way to speed this up?

Using the insert command:

par_tmp = paste('<Parameter id="', pars, '"/>', sep="")

      

takes less than a second.

+1


a source to share


1 answer


I would recommend profiling the function with a help Rprof

or package . This will show you where your bottleneck is, and then you can think about how to optimize the function or change the way you use it. profr

Your example paste

will be much faster in part because it is vectorized. For a more accurate comparison, you can see the difference cyclically newXMLNode

paste

as you are currently doing with newXMLNode

and see the time difference.

Edit:



Here is the output from profiling your loop with profr

.

library(profr)
xml.prof <- profr(for(i in 1:N) 
    newXMLNode("Parameter", parent=seq, attrs=c(id=pars[i])))
plot(xml.prof)

      

Nothing particularly obvious here in the places where you can improve it. I see that it spends a reasonable amount of time on the function %in%

, so improving it will cut the overall time somewhat (although you still have to iterate over and over again, so it won't matter much). A better solution would be to rewrite newXMLNode

as a vectorized function so you can skip the loop entirely for

.alt text

+1


a source







All Articles