jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tai Tran <hutieu...@gmail.com>
Subject The fastest way to dump JackRabbit data?
Date Thu, 22 Apr 2010 14:48:32 GMT
Hi,

I'm very new to JackRabbit, but I'm challenged by a performance-critical
task in my project that needs dumping the whole JackRabbit data into CSV
file.

We're using JackRabbit standalone server 1.6.0 with MySQL 5.x to store a
huge hierarchical data of network devices. Each device can have up to 100
attributes, and several thousands child nodes under with nth level of depth:

device[1]
   rack
      subrack
         port
            ...
      ...
   ...

device[2]
   ...

device[5000]
   ...

We need to dump the whole JackRabbit data in tree structure into a flat CSV
file with each row is a data of one node. The output CSV data is as huge as
the source JackRabbit data, up to 3.6 millions lines with the following
format:

rack, attr1, attr2, ...
rack, attr1, attr2, ...
...
subrack, attr1, attr2, ...
...

To minimize calls through RMI access layer, we tried iterating each device
in the repository and using Node.exportSystemView() to dump the data into a
XML file on hard disk, and then parsing it to generate output in CSV file.
However, it is very slow, we ended up with more than 5 hours to dump the
whole JackRabbit data on a very fast server while we targeted it to complete
within 15 minutes (almost insane)!

Now we're planning to change JackRabbit source code to add our customized
version of exportSystemView in hope of tackling this performance issue.

Any suggestions are really appreciated!!!

Thanks a lot,
Tai Tran

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message