From jackrabbit-dev-return-4162-apmail-incubator-jackrabbit-dev-archive=www.apache.org@incubator.apache.org Wed Oct 26 08:29:59 2005 Return-Path: Delivered-To: apmail-incubator-jackrabbit-dev-archive@www.apache.org Received: (qmail 34561 invoked from network); 26 Oct 2005 08:29:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 Oct 2005 08:29:56 -0000 Received: (qmail 46371 invoked by uid 500); 26 Oct 2005 08:29:52 -0000 Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jackrabbit-dev@incubator.apache.org Delivered-To: mailing list jackrabbit-dev@incubator.apache.org Received: (qmail 46360 invoked by uid 99); 26 Oct 2005 08:29:52 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2005 01:29:52 -0700 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: 193.19.192.5 is neither permitted nor denied by domain of the.mindstorm.mailinglist@gmail.com) Received: from [193.19.192.5] (HELO mail.evolva.ro) (193.19.192.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2005 01:29:49 -0700 Received: from localhost (localhost [127.0.0.1]) by mail.evolva.ro (Postfix) with ESMTP id 4FF986FDC92 for ; Wed, 26 Oct 2005 11:29:29 +0300 (EEST) Received: from mail.evolva.ro ([127.0.0.1]) by localhost (evonet.ro [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 20920-02-93 for ; Wed, 26 Oct 2005 11:29:28 +0300 (EEST) Received: from [192.168.62.51] (unknown [86.55.40.139]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.evolva.ro (Postfix) with ESMTP id 8768D6FDC91 for ; Wed, 26 Oct 2005 11:29:28 +0300 (EEST) Message-ID: <435F4C03.5090706@gmail.com> Date: Wed, 26 Oct 2005 11:27:31 +0200 From: Alexandru Popescu User-Agent: Thunderbird 1.4.1 (Windows/20051006) MIME-Version: 1.0 To: jackrabbit-dev@incubator.apache.org Subject: Re: large repository References: <20051006154517.GB31708@mybox> <20051025011928.GA19708@mybox> <435DD9D0.1080003@gmx.net> <20051025215933.GA21578@mybox> <435F2CDF.5090809@gmx.net> In-Reply-To: <435F2CDF.5090809@gmx.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at evonet.ro X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N #: Marcel Reutegger changed the world a bit at a time by saying on 10/26/2005 9:14 AM :# > Hi John, > > I haven't tried the bdb persistence manager yet. > > but it seems that brian is working with it, maybe he can share his > experience? > > regards > marcel > How is db-persistence (so Derby) storing binary content? (I mean f.e. the uploaded files are stored in the DB as blobs? or as BerkleyDB is doing on FS?) thanks, ./alex -- .w( the_mindstorm )p. > js@neasys.com wrote: >> Hi, Marcel, >> >> Thanks a lot for your reply. One more question: >> how does bdb persistent compare with db persistent? >> Which one will be able to hold more items? >> >> John >> >> On Tue, Oct 25, 2005 at 09:08:00AM +0200, Marcel Reutegger wrote: >> >>>Hi John, >>> >>>js@neasys.com wrote: >>> >>>>I have tried jcr/jackrabbit and like it. >>>>Next I would like to push jackrabbit to its limit: >>>>load in as many items as possible. I would appreciate help on >>>>a few configuration/tuning issues: >>>>(1) which persistent manager to use? >>> >>>in a recent test I imported over a million wikipedia articles which >>>resulted in about 6 million items. no versioning, btw. >>> >>>my configuration is: >>>dell latitude d505 >>>db-persitence using derby >>>256m heap >>> >>>at the beginning the time to add an article was about 5ms. >>>towards the end of the load the time to add an article was stable at >>>about 50ms. >>> >>>some other figures: >>>db size: 2 GB >>>index size: 300 MB >>> >>> >>>>(2) what parameters to tune? >>> >>>I can give you some advice on configuring the index: the default config >>>will cause lucene to create segments of 100 nodes, which will be merged >>>when as soon as 10 segments exist. when doing a bulk load you should set >>>the paramter minMergeDocs to a higher value. e.g. 1000. this will create >>>segments of 1000 nodes, and will be more efficient. >>> >>> >>>>(3) will multiple wordspaces help? >>> >>>IMO this might help, if you run into scalability issues with the >>>persistence manager you are using. >>> >>> >>>>(4) any other things to watch for? >>> >>>use separate disks for the index and workspace data. >>> >>> >>>>My host has 4GB ram and a few TB diskspace. >>>> >>>>Also, any doc describing all possbile elements in repository.xml? >>> >>>the sample repository.xml file in src/conf contains an inline dtd that >>>contains some documentation. >>> >>> >>>>And if SearchIndex can be turned off? >>> >>>yes, this is possible. you simply omit the SearchIndex element in the >>>configuration. though, I would be very interested to see how well the >>>index works with your data. >>> >>>regards >>> marcel >>> >>> >> >> __________________________________________ >> http://www.neasys.com - A Good Place to Be >> Come to visit us today! >> >> >