lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Query based stemming
Date Fri, 07 Jan 2005 22:59:24 GMT

: >Is it possible to enable stem queries on a per-query basis? It doesn't
: >seem to be possible since the stem tokenizing is done during the
: >indexing process. Are people basically stuck with having all their
: >queries stemmed or none at all?

:  From what I've read, if you want to have a choice, the easiest way is
: to index the documents twice. Once with stemming on and once with it off
: placing the results in two different indexes.  Then at query time,
: select which index you want to use based on whether you want stemming on
: or off.

As I understand it, the intented place to impliment Stemming is in an
Analyzer Filter (not to be confused with a search Filter).  Since you can
can specify an Analyzer when you call addDocument, you don't have to
acctually have two seperate indexes, you could just have all the docs in
one index - and use a search Filter to indicate which docs to look at.

Alternately: the Analyzer's tokenStream method is given the fieldName
being analyzed, so you could write an Analyzer with a set of rules
telling it to only apply your Stemming filter to certain fields, and
then instead of having twice as many documents, you can just index your
text in two seperate fields (which should be a little easier, then
seperate docs because you are only duplicating the fields where stemming
is relevant)  Then at search time you don't have to filter anything, just
search the field that's applicable to your current desire (stemmed or
unstemmed)

Lastely: Allthough it's tricky to get correct, there's no law saying you
have to use the same Analyzer when you query as when you index.  You could
index your documents using an Analyzer that does no stemming, and then at
search time (if you want stemming) use an Analyzer that does "reverse
stemming" to expand your query terms out to all the possible variants.


(NOTE: I've never acctaully tried this, but i think the theory is sound).


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message