jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject improving the scalability in searching part 2
Date Wed, 08 Aug 2007 14:47:39 GMT
Problem 2:

2) The XPath jcr:like implementation, for example : //*[jcr:like(@mytext,'%foo bar qu%')]

The jcr:like implementation (for sql holds the same) is translated to a JackRabbit WildcardQuery
which in turn uses a WildcardTermEnum which has a "protected boolean termCompare(Term term)"
method (though I haven't sorted out where the exact bottleneck is).

Now, it boils down that when you search for nodes which have some string in some property,
this implies scanning UN_TOKENIZED fields in lucene, which is IMHO, not the way to do it (though,
I haven't yet got *the* solution for the wildcard parts. Without the wildcards, obviously
a PhraseQuery would do on the indexed TOKENIZED property <X:FULL:myproperty> field.


Anyway, the current jcr:like results in queries taking up to 10 seconds to complete for only
1000 nodes with one property, "mytext" which is on average 500 words long. A cached IndexReader
won't be faster in it. 

The jcr:like is I think not useable according the current implementation. Perhaps somebody
else know how to be able to use the PhraseQuery in a way that suits our needs (I already posted
to the lucene list if there is some best way to implement an 'like' functionality)

Regards Ard

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
-------------------------------------------------------------- 

Mime
View raw message