jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandru Popescu <the.mindstorm.mailingl...@gmail.com>
Subject Re: large repository
Date Wed, 26 Oct 2005 09:27:31 GMT
#: Marcel Reutegger changed the world a bit at a time by saying on  10/26/2005 9:14 AM :#
> Hi John,
> 
> I haven't tried the bdb persistence manager yet.
> 
> but it seems that brian is working with it, maybe he can share his 
> experience?
> 
> regards
>   marcel
> 

How is db-persistence (so Derby) storing binary content? (I mean f.e. the uploaded files are
stored 
in the DB as blobs? or as BerkleyDB is doing on FS?)

thanks,

./alex
--
.w( the_mindstorm )p.

> js@neasys.com wrote:
>> Hi, Marcel,
>> 
>> Thanks a lot for your reply. One more question:
>> how does bdb persistent compare with db persistent?
>> Which one will be able to hold more items?
>> 
>> John
>> 
>> On Tue, Oct 25, 2005 at 09:08:00AM +0200, Marcel Reutegger wrote:
>> 
>>>Hi John,
>>>
>>>js@neasys.com wrote:
>>>
>>>>I have tried jcr/jackrabbit and like it.
>>>>Next I would like to push jackrabbit to its limit:
>>>>load in as many items as possible. I would appreciate help on
>>>>a few configuration/tuning issues:
>>>>(1) which persistent manager to use?
>>>
>>>in a recent test I imported over a million wikipedia articles which 
>>>resulted in about 6 million items. no versioning, btw.
>>>
>>>my configuration is:
>>>dell latitude d505
>>>db-persitence using derby
>>>256m heap
>>>
>>>at the beginning the time to add an article was about 5ms.
>>>towards the end of the load the time to add an article was stable at 
>>>about 50ms.
>>>
>>>some other figures:
>>>db size: 2 GB
>>>index size: 300 MB
>>>
>>>
>>>>(2) what parameters to tune?
>>>
>>>I can give you some advice on configuring the index: the default config 
>>>will cause lucene to create segments of 100 nodes, which will be merged 
>>>when as soon as 10 segments exist. when doing a bulk load you should set 
>>>the paramter minMergeDocs to a higher value. e.g. 1000. this will create 
>>>segments of 1000 nodes, and will be more efficient.
>>>
>>>
>>>>(3) will multiple wordspaces help?
>>>
>>>IMO this might help, if you run into scalability issues with the 
>>>persistence manager you are using.
>>>
>>>
>>>>(4) any other things to watch for?
>>>
>>>use separate disks for the index and workspace data.
>>>
>>>
>>>>My host has 4GB ram and a few TB diskspace.
>>>>
>>>>Also, any doc describing all possbile elements in repository.xml?
>>>
>>>the sample repository.xml file in src/conf contains an inline dtd that 
>>>contains some documentation.
>>>
>>>
>>>>And if SearchIndex can be turned off?
>>>
>>>yes, this is possible. you simply omit the SearchIndex element in the 
>>>configuration. though, I would be very interested to see how well the 
>>>index works with your data.
>>>
>>>regards
>>> marcel
>>>
>>>
>> 
>> __________________________________________
>> http://www.neasys.com - A Good Place to Be
>> Come to visit us today!
>> 
>> 
> 


Mime
View raw message