lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: QueryParser and compound words
Date Fri, 14 Mar 2003 05:21:42 GMT
On Thursday 13 March 2003 00:52, Magnus Johansson wrote:
> Tatu Saloranta wrote:
> >But same happens during indexing; fotbollsmatch should be properly
> >split and stemmed to "fotboll" and "match" terms, right?
> Yes but the word fotbollsmatch was never indexed in this example. Only
> the word fotboll.
> I want a query for fotbollsmatch to match a document containing the word
> fotboll.

Ok I think I finally understand what you meant. :-)

So, basically, in your case you would prefer getting query:


to expand to (after stemming etc):

fotboll match

and not

"fotboll match"

So that matching just one of the words would be enough for a hit (either
"either of" or "just first word" or "just last word").
It would be possible to implement this functionality by overriding default
QueryParser and modifying its functionality slightly. 

In QueryParser you should be able to override default handling for terms,
so that whenever you get just single token (in this case "fotbollsmatch")
that expands to multiple Terms, you do not construct a phrase query, but
just BooleanQuery with TermQueries (look at getFieldQuery(); it handles
basic search terms). You may need to use simple heuristics for figuring
when you have white space(s) that indicate "normal" phrases, which probably
should still be handled using PhraseQuery.

Of course this is all assuming you still do want that functionality. :-)
And if you do, it would be good idea to get patch back in case someone else
finds that useful later on (I think many non-english languages have concept
of compound words; German and Finnish at least do).

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message