Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 95323 invoked from network); 9 Aug 2010 15:36:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Aug 2010 15:36:44 -0000 Received: (qmail 62748 invoked by uid 500); 9 Aug 2010 15:36:44 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 62636 invoked by uid 500); 9 Aug 2010 15:36:43 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 62629 invoked by uid 99); 9 Aug 2010 15:36:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Aug 2010 15:36:43 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of a.schrijvers@1hippo.com designates 64.18.2.18 as permitted sender) Received: from [64.18.2.18] (HELO exprod7og120.obsmtp.com) (64.18.2.18) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 09 Aug 2010 15:36:37 +0000 Received: from source ([209.85.215.179]) by exprod7ob120.postini.com ([64.18.6.12]) with SMTP ID DSNKTGAgcKPSpP1nupbtgDQ5M/UUq7+Ku0aJ@postini.com; Mon, 09 Aug 2010 08:36:16 PDT Received: by eye27 with SMTP id 27so4600365eye.24 for ; Mon, 09 Aug 2010 08:36:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.97.129 with SMTP id l1mr3036899ebn.56.1281368175437; Mon, 09 Aug 2010 08:36:15 -0700 (PDT) Received: by 10.213.19.66 with HTTP; Mon, 9 Aug 2010 08:36:15 -0700 (PDT) In-Reply-To: References: Date: Mon, 9 Aug 2010 17:36:15 +0200 Message-ID: Subject: Re: Jackrabbit performance data From: Ard Schrijvers To: dev@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 Hello, On Mon, Aug 9, 2010 at 4:54 PM, Jukka Zitting wrote: > Hi, > > On Mon, Aug 9, 2010 at 3:53 PM, Ard Schrijvers > wrote: >> First of all, thanks a lot for this Jukka. I really like it. Would you >> have an idea how we could measure performance for larger repositories. >> For example, I would be glad to add some query performance tests, but, >> obviously, querying can be very sensitive to the number of nodes. I >> would be interested in the performance of some queries (of xpath, sql >> and qom) against different repository version, but then specifically >> queries against large repositories. I understand if it is not feasible >> because the tests would take to long. WDYT? > > The size of the test repository shouldn't be too much of a problem, as > long as the setup/teardown code doesn't take hours to complete. A few > minutes per test is still quite OK; you can create quite a bit of test > content in that time. The thing is that I am particularly interested in doing searches against, say 100K+ nodes. I have downloaded 6 Gb of wiki xml pages. I would like to see search improvements/degradations between versions when the amount of data is large. It is important that we see when we implement some search feature that doesn't scale to well. Obviously, the unit tests search index is an in memory one, which might also influence the real numbers. > The test suite currently doesn't allow multiple > different tests to share test content, but that should be easy to > solve by introducing a concept of test groups with their own > setup/teardown phases. Yes, true. Only obviously hard when some tests modify the data. This might again influence other tests, where we get some hard to see inter-dependence between tests. Certainly, for searching for example, it is interesting to see *how* long the first search in a warmed up environment *after* some persisted modification takes. > > A more essential consideration is the time it takes to execute a > single test query. Currently the test suite is configured to spend 50 > seconds iterating over a single performance tests, so to get good > statistics an individual test shouldn't take much longer than a few > seconds. We can increase the execution time, but I think a few seconds > should in any case be the upper limit for most interesting search use > cases. Well, I think when for example a search takes more than a second we should get some alarm bells anyway :) , so, 50 seconds seems more then fine to me. > > See the simple search test case I added in revision 983662. It would Thank you so much, you make it very easy for me :) > be great if you'd be interested in adding more complex search > benchmarks. Yes, I am. I'll try to find some time (think it will be my spare time so hope this or next weekend to be able to do so) on short notice to play around with the tests, and add a bunch of search tests. I am interested to see the evolution between versions of some searches, and also the scalability (within a single repo version) of some searches. For example path constrained searches and range queries. Particularly I am interested in the search performance numbers as I think we need to invest time in some search refactoring: I think, the Jackrabbit Search implementation was really state-of-the-art against the original Lucene version it was built against. But, now, it suffers imo from some of this historical grown things, like the 'multiple indexes' (I tested IndexReader.reopen() against couple of millions of lucene docs available since Lucene 2.3.0 : I think this reopen pretty much does on Lucene segment level what Jackrabbit does on indexes level. It keeps all valid segments open. We can make so much code easier). I had some interesting talk with a Hibernate Search developer, facing similar requirements like real time search. Also, recent Lucene improvements like TrieRange queries, and upcoming features like NestedDocumentQueries and incremental field updates might get really interesting for us as well. But I do not want to go to much into detail now, as it should be another thread. Hope to get back on this in not to long. To get back to this thread, the performance plots would make hopefully my findings visible, or not if I am just wrong :) Thanks a lot again Jukka, Regards Ard > > BR, > > Jukka Zitting >