Return-Path: X-Original-To: apmail-jackrabbit-users-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C821B95A1 for ; Fri, 3 Feb 2012 12:13:57 +0000 (UTC) Received: (qmail 85247 invoked by uid 500); 3 Feb 2012 12:13:57 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 85138 invoked by uid 500); 3 Feb 2012 12:13:55 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 85130 invoked by uid 99); 3 Feb 2012 12:13:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 12:13:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mslama@email.cz designates 77.75.72.26 as permitted sender) Received: from [77.75.72.26] (HELO mxh1.seznam.cz) (77.75.72.26) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 12:13:49 +0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=email.cz; h=To:Date:Reply-To:From:Received:Subject:Mime-Version:Message-Id:Content-Transfer-Encoding:Content-Type:X-Country:X-Abuse:X-Seznam-User:X-Virus-Info:X-Seznam-SPF:X-Seznam-DomainKeys; b=kwaVfGwkSmYRd8ZlYiaUTdZSMMvh3qA+oVwDUkaWJyEIp/c5RKBDkZ7fvnS6ARc0P 8kLAa+8dnc4wtiKXo3x0fCNW9IoFUKUw1vtEJlyDF0tDM5OgcBvr1C2HsxuDRfzSbYI L2xOU0zB1bDMeyNFVcEc/5Tta5Zm1jEpm88wxE8= To: users@jackrabbit.apache.org Date: Fri, 03 Feb 2012 13:13:25 +0100 (CET) Reply-To: mslama@email.cz From: mslama@email.cz Received: from ( [80.188.171.66]) by email.seznam.cz (Email.Seznam.cz) with HTTP for mslama@email.cz; Fri, 3 Feb 2012 12:55:19 +0100 (CET) Subject: =?us-ascii?Q?XPath=20query=20performance=20question?= Mime-Version: 1.0 Message-Id: <98772.1177.1935-26405-1397053679-1328271205@email.cz> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii" X-Country: CZ X-Abuse: abuse@seznam.cz X-Seznam-User: mslama@email.cz X-Virus-Info: clean X-Seznam-SPF: neutral X-Seznam-DomainKeys: neutral Hi, I have following use case: I have about 2000 company nodes under node companies: /companies/company[1] /companies/company[2] .... /companies/company[N] I query for one company by property value - exact match, no wildcards. And result should contain just one node. For example I use query: //companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2c970a55-e08d-3af8-ad1d-3c46f341e749'] and then one call of NodeIterator.next to get unique (or first as there is no constraint on uniqueness) result. So there is no big resultset. Property 'calais' is string type and when set it is unique ie. small number of company nodes may have this property either empty or missing. Property value can be up to 100chars long if it can make any difference for index. When only one thread is running it takes 100-200ms. When 4 threads are running it is about 500ms on average. I used profiler with sampling to get some profiling data. I seems to be too much provided that number on nodes is not that high and it is using Lucene index. Calls of query.execute and nodeIterator.next take both about the same time. When I checked thread dumps it uses Lucene index so it does not look like it scans all nodes. Question: Is there any way how speedup this kind of lookup? The only way I found so far is to incorporate the most often property used for lookup to node path as session.getNode(path) is much faster. I use Jackrabbit 2.2.9 and Postgres 9.1 for saving all data but Lucene index. It runs on JBoss 7. I searched for Jackrabbit XPath performance but no match for my use case: a) exact property match without like/wildcards b) small resultset - just one result item Thanks Marek