lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Josal <rjo...@gmail.com>
Subject Re: An unusual (new?) multi term synonym bug?
Date Fri, 26 Feb 2016 15:13:28 GMT
Is this by design or is there a Jira to track it?  It makes it a little
difficult to use my own synonyms with wordnet.  Other use cases:
*) SynonymFilters separated by other filters
*) SynonymFilters with different analyzers configured
*) SynonymFilters with different case sensitivity

Expansion wouldn't be an issue since you can control it with file format.

Would it be ok to update some documentation about this?  The
AnalyzersTokenizersTokenFilters page comes to mind (by the way, the
tokenizerFactory search-lucene.com link is throwing an exception).

Ryan

On Friday, February 26, 2016, Michael McCandless <lucene@mikemccandless.com>
wrote:

> You can't put a SynonymFilter in front of another one, because the 2nd
> one is unable to properly consume an arbitrary graph..
>
> For the same reason, you can't put e.g. JapaneseTokenizer before a
> SynonymFilter and expect it to always work.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 25, 2016 at 6:10 PM, Ryan Josal <rjosal@gmail.com
> <javascript:;>> wrote:
> > I know, there's a ton of documentation about the query parser whitespace
> > issue, and there's also a fair bit of info on the positionLengthAttribute
> > issue, but I seem to have stumbled upon a new issue with multi term
> > synonyms: it doesn't seem to play well with a bunch of tokens in the same
> > position.
> >
> > I have a synonym filter with this expansion:
> > side table,end table
> >
> > I can see the synonym is applied when looking at the token stream output
> for
> > "side table".  Today I decided to throw an additional synonymFilter
> > immediately before that one with wordnet synonym expansions.  Wordnet
> > expectedly bloats the tokenstream, but all of a sudden the original end
> > table expansion doesn't get applied.  I see "side" followed by a bunch of
> > tokens in the same position, followed by a couple new tokens in the next
> > position, followed by "table" in the same token position, followed by
> some
> > more new tokens in the same position.  Since side is still adjacent to
> table
> > in token positions, I would expect the synonym to hit.  Is this a known
> > issue (what's the Jira)?  The impact seems significant.  Since wordnet
> is so
> > comprehensive, it's likely going to cause this issue with most of my
> multi
> > term synonyms.  Maybe the workaround is to apply multi term synonyms
> first
> > as best is possible, although I don't know if you have that kind of
> control
> > if all your synonyms are applied by a single SynonymFilter.
> >
> > Thanks,
> > Ryan
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@lucene.apache.org <javascript:;>
>
>

Mime
View raw message