Return-Path: Delivered-To: apmail-incubator-jackrabbit-dev-archive@www.apache.org Received: (qmail 10370 invoked from network); 26 Sep 2005 10:57:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 Sep 2005 10:57:18 -0000 Received: (qmail 78581 invoked by uid 500); 26 Sep 2005 10:57:16 -0000 Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jackrabbit-dev@incubator.apache.org Delivered-To: mailing list jackrabbit-dev@incubator.apache.org Received: (qmail 78568 invoked by uid 99); 26 Sep 2005 10:57:16 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2005 03:57:16 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of stefan.guggisberg@gmail.com designates 64.233.162.205 as permitted sender) Received: from [64.233.162.205] (HELO zproxy.gmail.com) (64.233.162.205) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2005 03:57:22 -0700 Received: by zproxy.gmail.com with SMTP id x3so410537nze for ; Mon, 26 Sep 2005 03:56:54 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=EwPy1dLiLiJ9o4A88F7dDpGEJco/ARp6IDWCmZICnjqvpDBDcroX2owZWyxjpnaoGbgxhZZp0LExnJmn4hZUU+/m9tpezTTSWaZ4rOiVIWrIaB1sZATQlhBhUVhGeneYR0tfM3qhGxUa6RabNPMEN3cc1XlDm8dPy304KtLMR8M= Received: by 10.36.39.15 with SMTP id m15mr1477941nzm; Mon, 26 Sep 2005 03:56:54 -0700 (PDT) Received: by 10.36.81.3 with HTTP; Mon, 26 Sep 2005 03:56:54 -0700 (PDT) Message-ID: <90a8d1c005092603561fb7ee8f@mail.gmail.com> Date: Mon, 26 Sep 2005 12:56:54 +0200 From: Stefan Guggisberg Reply-To: Stefan Guggisberg To: jackrabbit-dev@incubator.apache.org Subject: Re: Jackrabbit Performance In-Reply-To: <200509251339.j8PDdABr005159@post.webmailer.de> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <200509251339.j8PDdABr005159@post.webmailer.de> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N hi daniel, some remarks/answers follow inline: On 9/25/05, Daniel Hagen wrote: > Hi, > > I apologize if this is the wrong place to ask my questions but I do not k= now > where else I should ask. > > I am currently considering the use of Jackrabbit in a future project. > The (very) rough layout I am thinking about is Jboss as Application Serve= r > and Jackrabbit for content storage (equipped with a custom access manager > and login module for authentication & authorization). > > But I am not sure whether Jackrabbit will be able to handle the amount of > data we will have to deal with. > The application might have to handle ~ 2000 - 5000 new documents/day (siz= e > ranging from 2kb to 1 mb, I assume an average of ~50 KB). > Each document will have about 5 - 10 simple text properties and the "bina= ry" > content of the documents (plain text/HTML/MS Word/PDF) will have to be > indexed for a fulltext search. > Read access to the contents will not be very frequent, I am assuming 5 > requests for the mentionened simple properties of a node per minute, 5 > concurrent users, access to binary contents will propably appear once eve= ry > minute. > > In short: The application will have to be able to do a fulltext search on > (worst case) more than 10,000,000 contents and will have to handle creati= on > of new contents without stalling the server. > > What is your opinion, is Jackrabbit the right tool for the task? > Which Persistence Manager would be the best choice? > Are there any special hardware considerations I should think about (e.g. > separating index and storage on separate discs using separate controllers > ...)? > Should we have OS preferences for the server (current options are Windows > 2003 Server vs. Linux with a strong preference towards Windows 2003 Serve= r)? if you're using a filesystem-based pm (e.g. ObjectPersistenceManager on LocalFileSystem) i'd definitely go for linux. the windows filesystem really sucks whith a large number of small files. with the CQFileSystem (custom filesystem in-a-file) you can improve the performance on a windows box considerably but it's not opensource and it's only free for non-commerc= ial use. ObjectPersistenceManager w/LocalFileSystem on a linux box provides imo a decent performance, it's major flaw is that it is non-transactional. there's also a jdbc-based pm in the contrib directory (contrib/db-persisten= ce). it is transactional and, depending on the type of database, provides a very decent performance (e.g. mysql). i suggest you setup your own performance/scalability tests. cheers stefan > > I know that not all of my questions are directly related to Jackrabbit > Development and some will propably not be answered due to a lack of exist= ing > data, but any clues/hints will be greatly appreciated. > > Thank you for your help! > > Daniel > >