jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject improving the scalability in searching part 2
Date Wed, 08 Aug 2007 14:47:39 GMT
Problem 2:

2) The XPath jcr:like implementation, for example : //*[jcr:like(@mytext,'%foo bar qu%')]

The jcr:like implementation (for sql holds the same) is translated to a JackRabbit WildcardQuery
which in turn uses a WildcardTermEnum which has a "protected boolean termCompare(Term term)"
method (though I haven't sorted out where the exact bottleneck is).

Now, it boils down that when you search for nodes which have some string in some property,
this implies scanning UN_TOKENIZED fields in lucene, which is IMHO, not the way to do it (though,
I haven't yet got *the* solution for the wildcard parts. Without the wildcards, obviously
a PhraseQuery would do on the indexed TOKENIZED property <X:FULL:myproperty> field.

Anyway, the current jcr:like results in queries taking up to 10 seconds to complete for only
1000 nodes with one property, "mytext" which is on average 500 words long. A cached IndexReader
won't be faster in it. 

The jcr:like is I think not useable according the current implementation. Perhaps somebody
else know how to be able to use the PhraseQuery in a way that suits our needs (I already posted
to the lucene list if there is some best way to implement an 'like' functionality)

Regards Ard


Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl

View raw message