lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Watkins <>
Subject Re: stemmed search and exact match on "same" field
Date Tue, 15 Aug 2006 10:58:33 GMT
Thank you, Chris. You have confirmed what I had all but resigned myself
to (and you summarized my goal precisely). I am sticking with the two
versions of the field and just accepting the fact that the search
clients will need to use my custom query parser.

Even if one doesn't get the answer one wants, it's at least comforting
to get a definitive answer so that the speculation can stop and the work
can get done.

thanks again,
-- Robert

On Mon, 14 Aug 2006, Chris Hostetter wrote:

> Therre's a lot of information in your email, and a lot of questions that
> relate to similar topics and address different ways of acomplishing
> similar but different things ... too much for me to digest
> all at once, so lemme start by seeing if i can summarize your goal, and
> then give you my suggestion based on the goal as i see it...
> You want simple term matches to be "stemmed" but you want phrase ueries to
> be "unstemmed"
> so if i user queries for the word...
> 	jumped
> want that to match any of the words: jump, jumps, jumped, etc...
> if a user queries for...
> 	"the dogs"
> want that to only match the exact phrase and not something with the
> tokens "the dog"
> you want these ideas to work, even if phrases and terms are mixed in
> the users query...
> 	foo:jumped bar:"the dogs"
> My first though is that you kepe using two versions of hte field (one
> stemmed and one unstemmed) and you then subclass QueryParser and override
> the getFieldQuery(String field, String queryText) method ... if the second
> arg looks like a phrase to you (ie: it has spaces or what not) them return
> super.getField(field, queryText).  If it's not a phrase, then call
> super.getField(field + "_STEMMED", queryText).
> where this breaks down is if you want the non-stemmed behavior even if hte
> users "phrase" only contains one word, ie...
> 	foo:jumped bar:"dogs"
> ...because the information that "dogs" was in quotes is lost by the time
> getFieldQuery is called.  You'd have to write a lot more QueryParsing code
> to get that behavior.
> In general, for your goal, i would not attempt to put both teh stemmed and
> unstemmed tokens in the same field -- because as i think you mentioned,
> there is not way to tell them apart at query time.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message