jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: improving the scalability in searching part 2
Date Tue, 14 Aug 2007 14:29:28 GMT

> On 8/8/07, Ard Schrijvers <a.schrijvers@hippo.nl> wrote:
> > ...2) The XPath jcr:like implementation, for example : 
> //*[jcr:like(@mytext,'%foo bar qu%')]
> > ...the current jcr:like results in queries taking up to 10 
> seconds to complete for only
> > 1000 nodes with one property, "mytext" which is on average 
> 500 words long....
> 

Bertrand Delacretaz wrote:

> Just curious, is
> 
>   %foo bar qu%
> 
> much slower than
> 
>   foo bar qu%
> 
> ?
> 
> I'd guess so, as Lucene-based indexes are usually inefficient with
> leading wildcards. Do your tests confirm that?

Yes they do. A leading wildcard is incredibly slow for text bodies. Using trailing wildcards
only, seems to be fast enough, though probably scale linearly with the number of documents
since it is probably done with a startswith on a lucene field value. For a leading wildcard,
I think some sort of 2 step filter might work, where the first term is expanded to all possible
terms that end with that term, then seek for documents in the full text that match, and then
do the current filter over this filtered set. WDOT? 

The org.apache.lucene.misc.ChainedFilter seems suitable for the job, though I haven't worked
with it yet. 

Regards Ard 

> 
> -Bertrand
> 
Mime
View raw message