lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: dash-words
Date Wed, 02 Aug 2006 20:05:07 GMT
: with a query like this +arbeiterjugend +west-berlin I get no results.
: org.apache.lucene.queryParser.QueryParser.parse makes this query (with
: WordDelimiterFilter) with Default QueryParser.AND_OPERATOR:
: +titel:arbeiterjugend +titel:"west (berlin westberlin)"
: with +arbeiterjugend +westberlin I get the result.
: It seems that the synonyms don't work with the query. How do you solve
: this in Solr? Do I have to build a TermQuery?

First off, when using WordDelimiterFilter it's generally a good idea to
use a slightly differnet configuration of the Filter in your indexng
analyzer then in your query analyzer -- this is discussed a bit inthe

...this can help avoid situations like you describe.

but in general, what you are running into is a general constraint of the
way Analyzers can produce tokens with a "zero gap" indicating that they
occupy the same spot as the previous token, but there is no way for the
analyzer to indicate that a sequence of 1 or more tokens occupies the same
space as another sequence of 1 or more tokens.  so when QUeryParser asks
the analyzer to make a token stream out of "west-berlin" the analyzer has
no way to return a token stream that can easily be recognized as [ [[west]
[berlin]] or [westberlin] ].

this does in fact prove to be a large problem when dealing with "multi
word synonyms" (also discussed in the wiki mentioned above) but can
generally be dealt with in the WordDeliminterFilter.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message