jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: large repository
Date Wed, 26 Oct 2005 07:14:39 GMT
Hi John,

I haven't tried the bdb persistence manager yet.

but it seems that brian is working with it, maybe he can share his 
experience?

regards
  marcel

js@neasys.com wrote:
> Hi, Marcel,
> 
> Thanks a lot for your reply. One more question:
> how does bdb persistent compare with db persistent?
> Which one will be able to hold more items?
> 
> John
> 
> On Tue, Oct 25, 2005 at 09:08:00AM +0200, Marcel Reutegger wrote:
> 
>>Hi John,
>>
>>js@neasys.com wrote:
>>
>>>I have tried jcr/jackrabbit and like it.
>>>Next I would like to push jackrabbit to its limit:
>>>load in as many items as possible. I would appreciate help on
>>>a few configuration/tuning issues:
>>>(1) which persistent manager to use?
>>
>>in a recent test I imported over a million wikipedia articles which 
>>resulted in about 6 million items. no versioning, btw.
>>
>>my configuration is:
>>dell latitude d505
>>db-persitence using derby
>>256m heap
>>
>>at the beginning the time to add an article was about 5ms.
>>towards the end of the load the time to add an article was stable at 
>>about 50ms.
>>
>>some other figures:
>>db size: 2 GB
>>index size: 300 MB
>>
>>
>>>(2) what parameters to tune?
>>
>>I can give you some advice on configuring the index: the default config 
>>will cause lucene to create segments of 100 nodes, which will be merged 
>>when as soon as 10 segments exist. when doing a bulk load you should set 
>>the paramter minMergeDocs to a higher value. e.g. 1000. this will create 
>>segments of 1000 nodes, and will be more efficient.
>>
>>
>>>(3) will multiple wordspaces help?
>>
>>IMO this might help, if you run into scalability issues with the 
>>persistence manager you are using.
>>
>>
>>>(4) any other things to watch for?
>>
>>use separate disks for the index and workspace data.
>>
>>
>>>My host has 4GB ram and a few TB diskspace.
>>>
>>>Also, any doc describing all possbile elements in repository.xml?
>>
>>the sample repository.xml file in src/conf contains an inline dtd that 
>>contains some documentation.
>>
>>
>>>And if SearchIndex can be turned off?
>>
>>yes, this is possible. you simply omit the SearchIndex element in the 
>>configuration. though, I would be very interested to see how well the 
>>index works with your data.
>>
>>regards
>> marcel
>>
>>
> 
> __________________________________________
> http://www.neasys.com - A Good Place to Be
> Come to visit us today!
> 
> 

Mime
View raw message