Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 18835 invoked from network); 23 May 2007 15:12:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 May 2007 15:12:13 -0000 Received: (qmail 97175 invoked by uid 500); 23 May 2007 15:12:18 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 97162 invoked by uid 500); 23 May 2007 15:12:18 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 97153 invoked by uid 99); 23 May 2007 15:12:18 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2007 08:12:18 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of cris.daniluk@gmail.com designates 66.249.92.175 as permitted sender) Received: from [66.249.92.175] (HELO ug-out-1314.google.com) (66.249.92.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2007 08:12:11 -0700 Received: by ug-out-1314.google.com with SMTP id o4so93412uge for ; Wed, 23 May 2007 08:11:49 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=KFNCVwFqR+mzTvXhJAovfF9H63skzCr0cFDXrz3pMrVaKwk0b+SBqDi1d+jHw/Gqpe1o2zO+iCDSKuCOiHm5/+KjP9gMB6I+I6nXMLm/mimbIvcoSYW0kR18FLGNOLsK6xV2tNeAwuXMObH2e8AjYB9IOU/62uANK9u3yCNoXx0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=XMahZ/4OQ4vbJOUW1z7aIDO9737BEs51wLTWOQYiW6cuj9T2SNYZak1JfgThoErIkA02VNJ/+IspjpslcZlyQ237fbFmdgdh8qa+N8vv1+jo/f0haNOsUy+/wbbah9JQdlfOSsPC1mOc448jZb9XYEo5ne57eNJ3/Fl302vMYHI= Received: by 10.114.15.1 with SMTP id 1mr346784wao.1179933107814; Wed, 23 May 2007 08:11:47 -0700 (PDT) Received: by 10.115.78.19 with HTTP; Wed, 23 May 2007 08:11:47 -0700 (PDT) Message-ID: Date: Wed, 23 May 2007 11:11:47 -0400 From: "Cris Daniluk" To: users@jackrabbit.apache.org Subject: Re: workspace / repository scalability In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: X-Virus-Checked: Checked by ClamAV on apache.org > > When you say adequate hierarchical structure, does this imply that we should > > try to keep our tree "bushy"? Really, because we rely on the external search > > engine for location, we only direct query on sequential ID at the database. > > Should a partitioning strategy be used? If so, what sort of depth might we > > aim for? > i see... i think it is important to mention that jackrabbit is not optimized for > long lists of child nodes currently, so i would recommend to stay away if > possible from more than a couple of hundred child nodes. > as a guidance for hierarchy i usually use something like: > "if i wouldn't do it in a filesystem, i don't do it in a content repository" > (assuming that i view a node as a file or folder) > > so let's assume your sequential hex-id is something like "123abc" i would > recommend something like a partitioning for the node structure as follows: > /12/3a/bc which leaves you with 256 child nodes per node. > So some form of hashing sounds desirable. If the hashed nodes were mapped to a separate node structure / workspace that did not hash to create a logical view, would the performance impact still apply? > personally, i like to use the derby persistence manager with external > fs based blobs (standard setup). with this setup i do > "hot backups" by just backing up the full repository folder in the filesystem. GBs of data in derby of course makes me nervous on a probably irrational level. The fs-based blobs is interesting, and may make DR replication easier. You mentioned some testing... it would be great for us to do similar testing with mock data reflecting our environment. If the tests you performed included any special harnesses, configuration, etc, would it be possible to see them? Thanks for all your help! Cris