lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: QueryParser and compound words
Date Thu, 13 Mar 2003 01:17:07 GMT
On Wednesday 12 March 2003 01:19, Magnus Johansson wrote:
> Well, the problem arise when a user enters a query with a compound word
> and the compound word itself is not indexed, only one of its parts.

Yes, but neither is compound word itself ever user in query either, assuming
same analyser is used (like it always should)?

> For example the index contains a document with the following word:
> fotboll (football).
> Let's say the users searches for fotbollsmatch (football game). The word
> is split into fotboll and match and the phrase "fotboll match" is
> searched for.
> The user finds no matching document.

But same happens during indexing; fotbollsmatch should be properly
split and stemmed to "fotboll" and "match" terms, right?

> Comparing this to english the user would have found a document, however
> scored
> slightly lower than a document containing both the words football and game.
> I agree with you that this might not be a problem. The user could be
> instructed
> to reformulate his query. However the behaviour for an english index and

I actually think that if user has to be aware of internal stemming and 
reformulate query I think this would be bit of a problem. :-)
But I'm not 100% sure search string would differ from indexed string, assuming 
same base token (unprocessed token, ie "fotbollsmatch") was both contained
in the document and searched for using QueryParser.

> a swedish
> index would be different.

I think that in general behaviour is heavily dependant on analyser (tokenizer 
+ stemmer) being used, so it's probably different between most languages.

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message