jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Edelson <justinedel...@gmail.com>
Subject Re: The fastest way to dump JackRabbit data?
Date Thu, 22 Apr 2010 15:34:16 GMT
A few suggestions:
* Don't produce an XML document and then transform it - use the method
which accepts a ContentHandler instead of an OutputStream and handle the
SAX events to produce your CSV.

* This seems like something which can be seriously parallelized. Fire up
5000 nodes and 15 minutes may be achievable :)

Justin

On 4/22/10 10:48 AM, Tai Tran wrote:
> Hi,
> 
> I'm very new to JackRabbit, but I'm challenged by a performance-critical
> task in my project that needs dumping the whole JackRabbit data into CSV
> file.
> 
> We're using JackRabbit standalone server 1.6.0 with MySQL 5.x to store a
> huge hierarchical data of network devices. Each device can have up to 100
> attributes, and several thousands child nodes under with nth level of depth:
> 
> device[1]
>    rack
>       subrack
>          port
>             ...
>       ...
>    ...
> 
> device[2]
>    ...
> 
> device[5000]
>    ...
> 
> We need to dump the whole JackRabbit data in tree structure into a flat CSV
> file with each row is a data of one node. The output CSV data is as huge as
> the source JackRabbit data, up to 3.6 millions lines with the following
> format:
> 
> rack, attr1, attr2, ...
> rack, attr1, attr2, ...
> ...
> subrack, attr1, attr2, ...
> ...
> 
> To minimize calls through RMI access layer, we tried iterating each device
> in the repository and using Node.exportSystemView() to dump the data into a
> XML file on hard disk, and then parsing it to generate output in CSV file.
> However, it is very slow, we ended up with more than 5 hours to dump the
> whole JackRabbit data on a very fast server while we targeted it to complete
> within 15 minutes (almost insane)!
> 
> Now we're planning to change JackRabbit source code to add our customized
> version of exportSystemView in hope of tackling this performance issue.
> 
> Any suggestions are really appreciated!!!
> 
> Thanks a lot,
> Tai Tran
> 


Mime
View raw message