jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From msl...@email.cz
Subject Re: Re: Re: XPath query performance question
Date Fri, 03 Feb 2012 15:51:59 GMT
Ok. I tried

//*[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2a2094a9-9a0b-3fb6-bb51-0c9e940deaaf']

instead of original

//companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2a2094a9-9a0b-3fb6-bb51-0c9e940deaaf']

and now query.execute+nodeIterator.getNext is 10times faster. (As property names we use are
unique I can use //* and get the same result. I must say it is quite unintuitive behaviour.
Hopefully it will help others too.)

Thanks for hint

Marek/
> ------------ Původní zpráva ------------
> Od:  <mslama@email.cz>
> Předmět: Re: Re: XPath query performance question
> Datum: 03.2.2012 16:25:07
> ----------------------------------------
> 
> No property 'calais' is not used anywhere else. So if I use query without path
> info it will return the same result.
> 
> Marek
> 
> > ------------ Původní zpráva ------------
> > Od: Alessandro <alessandro.bologna@gmail.com>
> > Předmět: Re: XPath query performance question
> > Datum: 03.2.2012 16:12:15
> > ----------------------------------------
> > If you were running the query without path restrictions, would it return more
> > than one node? In other words, outside the /companies tree, are there other
> > company nodes with the same calais attribute value?
> > Results are generated from the predicate, and then filtered by the path.
> > 
> > Alessandro 
> > 
> > On Feb 3, 2012, at 7:13 AM, mslama@email.cz wrote:
> > 
> > > Hi,
> > > 
> > > I have following use case:
> > > 
> > > I have about 2000 company nodes under node companies:
> > > /companies/company[1]
> > > /companies/company[2]
> > > ....
> > > /companies/company[N]
> > > 
> > > I query for one company by property value - exact match, no wildcards. And
> > result should contain just one node. For example I use query:
> > > 
> > >
> >
> //companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2c970a55-e08d-3af8-ad1d-3c46f341e749']
> > > 
> > > and then one call of NodeIterator.next to get unique (or first as there is
> no
> > constraint on uniqueness) result. So there is no big resultset.
> > > 
> > > Property 'calais' is string type and when set it is unique ie. small number
> of
> > company nodes may have this property either empty or missing. Property value
> can
> > be up to 100chars long if it can make any difference for index.
> > > 
> > > When only one thread is running it takes 100-200ms. When 4 threads are
> running
> > it is about 500ms on average. I used
> > > profiler with sampling to get some profiling data. I seems to be too much
> > provided that number on nodes is not that high
> > > and it is using Lucene index. Calls of query.execute and nodeIterator.next
> > take both about the same time.
> > > When I checked thread dumps it uses Lucene index so it does not look like
> it
> > scans all nodes.
> > > 
> > > Question: Is there any way how speedup this kind of lookup? The only way I
> > found so far is to incorporate the most often property used for lookup to
> node
> > path as session.getNode(path) is much faster.
> > > 
> > > I use Jackrabbit 2.2.9 and Postgres 9.1 for saving all data but Lucene
> index.
> > It runs on JBoss 7.
> > > 
> > > I searched for Jackrabbit XPath performance but no match for my use case: 
> > > a) exact property match without like/wildcards
> > > b) small resultset - just one result item
> > > 
> > > Thanks
> > > 
> > > Marek
> > 
> > 
> > 
> 
> Marek Slama
> mslama@email.cz
> 
> 
> 

Marek Slama
mslama@email.cz

Mime
View raw message