lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Konrad" <Karsten.Kon...@xtramind.com>
Subject AW: Analyzer use at search time?
Date Wed, 30 Apr 2003 09:21:17 GMT

Hi,

I had (and have) exactly the same problem: My postfix reducer returns an unstemmend
and a stemmed version of the word; but using this analyzer during search will give 
me either less than expected or no hits.

However, using analyzers during indexing that produce more than one token has 
other disadvantages anyway: the index gets much larger and searches therefore slower. 
I would like to use this kind of analyzer therefore only during search, but the 
behavior described above prevents this too.

So, again, is this a bug or did we overlook something?

Regards,

Karsten

-----Urspr√ľngliche Nachricht-----
Von: Armbrust, Daniel C. [mailto:Armbrust.Daniel@mayo.edu]
Gesendet: Dienstag, 29. April 2003 23:49
An: 'Lucene Users List'
Betreff: Analyzer use at search time?


I've written an analyzer which uses a filter which I wrote which invokes LVG's (http://umlslex.nlm.nih.gov/lvg/2003/index.html)
norm function on each token, and then, if there is more than one result for the token, it
puts all of the results into the same position as the original term (via setPositionIncrement(0)).
 This works great while indexing documents.

When my filter is run, LVG's norm turns the word "leaves" into "leaf" and "leave".  My filter
returns 3 tokens - leaves (position increment 1) leaf (0) and leave(0).

Now, when I search, if I provide the same LVG enabled analyzer, when I search for the word
"leaves" I get 0 hits.  I can see that it calls the norm function, and this returns the same
three words again as I would expect it to.  But I get 0 hits, even though all 3 words are
in the index.

If I search with an analyzer that does not have LVG in its flow, I get the correct hits each
of these searches - 

leaves
leaf
leave

So, is there a bug in the way that analyzers are used during a search - in that it does not
expect the analyzer to return more than one word in a single spot - or am I misusing lucene?


Thanks, 

Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message