From users-return-2416-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Thu Mar 01 09:30:41 2007 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 4054 invoked from network); 1 Mar 2007 09:30:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Mar 2007 09:30:40 -0000 Received: (qmail 99809 invoked by uid 500); 1 Mar 2007 09:30:48 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 99786 invoked by uid 500); 1 Mar 2007 09:30:47 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 99777 invoked by uid 99); 1 Mar 2007 09:30:47 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Mar 2007 01:30:47 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of sriramnrn@gmail.com designates 64.233.182.186 as permitted sender) Received: from [64.233.182.186] (HELO nf-out-0910.google.com) (64.233.182.186) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Mar 2007 01:30:36 -0800 Received: by nf-out-0910.google.com with SMTP id x4so854122nfb for ; Thu, 01 Mar 2007 01:30:15 -0800 (PST) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=rlGVe79JE5T+WIikPA/DBtTyuqQnV+ZkWIud+4mFbMr10UyI1VcaBRgbLogYXs0SiUEsmk+lK/VfRGUlTfmrFaCDOoPPZsIJS355VY/PiegzabOjmIjNVjAZG+jB7wGdIsU6wIU9i1aV5EZPNSt/6LoXvsB4wlQGNKzjxSGLkio= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=MErDv9w8nWA8cpdVhestxJZ2y4E5hnHHV5ARAoRnyVKcZ897gEDZvRa/z49sXHTd1dk4UwGThgkV2HP1t7J6Tj5IxLcxLexgvCGjmFMrChNJqi54OI+hHK06SPQB6/Rz5GveutKDn5QHjYK08T3zTbZe+pTux17Ldp9TocFIzj4= Received: by 10.82.136.4 with SMTP id j4mr506950bud.1172741415054; Thu, 01 Mar 2007 01:30:15 -0800 (PST) Received: by 10.82.150.15 with HTTP; Thu, 1 Mar 2007 01:30:15 -0800 (PST) Message-ID: <49977f270703010130h2541f8aw73e9ac88f17cc17e@mail.gmail.com> Date: Thu, 1 Mar 2007 15:00:15 +0530 From: "Sriram Narayanan" To: users@jackrabbit.apache.org Subject: Some questions on JackRAbbit performance with large data sets MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Checked: Checked by ClamAV on apache.org Hi list: We are using JackRabbit 1.2.1 at present. Out Nodes look like this: /Product/Customer1/Settings /Product/Customer1/Configuration /Product/Customer1/DataA /Product/Customer1/DataB /Product/Customer2/Settings /Product/Customer2/Configuration /Product/Customer2/DataA /Product/Customer2/DataB This way, we go all the way upto 250 customers. When we load data for all these customers, we see that the derby database size is 2.5 GB, and the Lucene Index is 470 MB. We have to provide for the following: a. Access data for around 20 customer simulatneously. b. The queries are of the type "All attributes of a given node for a given customer". c. Data about one customer should not be accessed by another customer. At present, we're access JackRabbit using 20 threads and 20 different sessions. This is to achieve separation of data etc. We're seeing performance figures such as the following: Network Derby: 80 seconds for all the threads to receive results Oracle: 35 seconds for all the threads to receive results Some questions: 1. What are the lessons learned by various community members on using Derby ? 2. Would you recommend using Oracle to using Derby for such large amounts of data ? 3. Are there ways to speed up lucene searches ? 4. Are lucene searches affected by such large indexes ? 5. Would it be better for us to split the repository into smaller ones and to then have smaller lucene indexes ? 6. For such large data, would Embedded Derby or Network derby be suitable to the task ? -- Sriram