jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cédric Damioli <cedric.dami...@anyware-tech.com>
Subject Lucene Analyzer not used when querying the index ?
Date Thu, 23 Feb 2006 15:19:42 GMT
Hi all,

I noticed that no Lucene Analyzer is used when querying the repository : 
when building the actual Lucene query the 
o.a.j.c.query.lucene.LuceneQueryBuilder does not make any use of the 
Analyzer (at least in my case).
So the input pattern is not correctly tokenized, and the query does not 
return the correct answer.

Let describe my exemple : I'm using chinese characters, say A and B. I 
set a property named "title" with the value "AB" (the two chinese 
characters without any witespace).
After indexation (with the default StandardAnalyzer) the text has been 
tokenized and the index contains at least three noticeable terms :
- one associated with the field _PROPERTIES and the value "titleï¿¿AB"
- one associated with the field FULL:title and the value "A"
- one associated with the field FULL:title and the value "B"

After that I try to execute an XPath Query like //*[jcr:contains(@title, 
'*AB*')]
I of course expected this query to return the previously set property, 
but I obtained no results.
After looking at the code, I can say that the Analyzer is not called for 
a WildcardQuery, so my "AB" is not tokenized and furthermore, it seems 
that the _PROPERTIES field is not used when searching, otherwise, I 
think it would match.

I know that StandardAnalyzer is not the best suited for handling chinese 
text, but that's another story.
It seems to me that there may be a Jackrabbit problem here, so I wanted 
to have your feelings about this.

Regards,

-- 
Cédric Damioli
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com


Mime
View raw message