lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: An unusual (new?) multi term synonym bug?
Date Fri, 26 Feb 2016 10:43:19 GMT
You can't put a SynonymFilter in front of another one, because the 2nd
one is unable to properly consume an arbitrary graph..

For the same reason, you can't put e.g. JapaneseTokenizer before a
SynonymFilter and expect it to always work.

Mike McCandless

On Thu, Feb 25, 2016 at 6:10 PM, Ryan Josal <> wrote:
> I know, there's a ton of documentation about the query parser whitespace
> issue, and there's also a fair bit of info on the positionLengthAttribute
> issue, but I seem to have stumbled upon a new issue with multi term
> synonyms: it doesn't seem to play well with a bunch of tokens in the same
> position.
> I have a synonym filter with this expansion:
> side table,end table
> I can see the synonym is applied when looking at the token stream output for
> "side table".  Today I decided to throw an additional synonymFilter
> immediately before that one with wordnet synonym expansions.  Wordnet
> expectedly bloats the tokenstream, but all of a sudden the original end
> table expansion doesn't get applied.  I see "side" followed by a bunch of
> tokens in the same position, followed by a couple new tokens in the next
> position, followed by "table" in the same token position, followed by some
> more new tokens in the same position.  Since side is still adjacent to table
> in token positions, I would expect the synonym to hit.  Is this a known
> issue (what's the Jira)?  The impact seems significant.  Since wordnet is so
> comprehensive, it's likely going to cause this issue with most of my multi
> term synonyms.  Maybe the workaround is to apply multi term synonyms first
> as best is possible, although I don't know if you have that kind of control
> if all your synonyms are applied by a single SynonymFilter.
> Thanks,
> Ryan

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message