lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Cavanna <cavannal...@gmail.com>
Subject Hunspell stemmer generates multiple tokens
Date Fri, 07 Jun 2013 13:16:24 GMT
Hi,
I just noticed that the HunspellStemmer outputs more than one tokens, the
original word plus the stems as far as I understood.

This is not quite what I would expect and becomes tricky especially at
query time. Using for instance elasticsearch to query a stemmed field, a
boolean query would be generated, containing multiple clauses (one for each
token generated by the stemmer) instead of just a clause with the stem that
we expect to find in the index (if we indexed using stemming of course).

I would like to know if you think this is the correct behaviour and if this
is something you are aware of. If I look at snowball for example, I see
that only one token is generated.


Thanks,
Luca

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message