jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j.@neasys.com
Subject Re: large repository
Date Tue, 25 Oct 2005 21:59:33 GMT
Hi, Marcel,

Thanks a lot for your reply. One more question:
how does bdb persistent compare with db persistent?
Which one will be able to hold more items?

John

On Tue, Oct 25, 2005 at 09:08:00AM +0200, Marcel Reutegger wrote:
> Hi John,
> 
> js@neasys.com wrote:
> >I have tried jcr/jackrabbit and like it.
> >Next I would like to push jackrabbit to its limit:
> >load in as many items as possible. I would appreciate help on
> >a few configuration/tuning issues:
> >(1) which persistent manager to use?
> 
> in a recent test I imported over a million wikipedia articles which 
> resulted in about 6 million items. no versioning, btw.
> 
> my configuration is:
> dell latitude d505
> db-persitence using derby
> 256m heap
> 
> at the beginning the time to add an article was about 5ms.
> towards the end of the load the time to add an article was stable at 
> about 50ms.
> 
> some other figures:
> db size: 2 GB
> index size: 300 MB
> 
> >(2) what parameters to tune?
> 
> I can give you some advice on configuring the index: the default config 
> will cause lucene to create segments of 100 nodes, which will be merged 
> when as soon as 10 segments exist. when doing a bulk load you should set 
> the paramter minMergeDocs to a higher value. e.g. 1000. this will create 
> segments of 1000 nodes, and will be more efficient.
> 
> >(3) will multiple wordspaces help?
> 
> IMO this might help, if you run into scalability issues with the 
> persistence manager you are using.
> 
> >(4) any other things to watch for?
> 
> use separate disks for the index and workspace data.
> 
> >My host has 4GB ram and a few TB diskspace.
> >
> >Also, any doc describing all possbile elements in repository.xml?
> 
> the sample repository.xml file in src/conf contains an inline dtd that 
> contains some documentation.
> 
> >And if SearchIndex can be turned off?
> 
> yes, this is possible. you simply omit the SearchIndex element in the 
> configuration. though, I would be very interested to see how well the 
> index works with your data.
> 
> regards
>  marcel
> 
> 
__________________________________________
http://www.neasys.com - A Good Place to Be
Come to visit us today!

Mime
View raw message