lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Dickinson <cliff.dickin...@gmail.com>
Subject Solr 6.4 new SynonymGraphFilter help for multi-word synonyms
Date Thu, 02 Feb 2017 14:36:21 GMT
I've been eagerly awaiting the release of the new SynonymGraphFilter in
Solr 6.4.  We have the need to support multi-word synonyms, which were
always problematic with the old SynonymFilterFactory.  I've upgraded to
Solr 6.4 and replaced the old filter with the new one, but am not seeing
the results that I had hoped for yet.  I suspect my configuration is
lacking something important.

I'm starting with the simple example in the SynonymGraphFilterFactory API
doucmentation:

<fieldType name="text_synonym" class="solr.TextField"
positionIncrementGap="100">
        <analyzer>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt"
                    format="solr" ignoreCase="false" expand="true"
                    tokenizerFactory="solr.WhitespaceTokenizerFactory"/>
        </analyzer>
    </fieldType>

And example entry in the synonyms.txt file is:

booster, representative of athletics interest

My problem with the old filter has always been that if I run a query for
"booster", I get results containing any of the following words: booster,
representative, athletics, interest.  This is way more results than I
want.  A document that only contains athletics, but none of the other words
in the synonym is returned.  What I really want are documents that contain
"booster" or the full synonym phrase of "representative of athletics
interest".  How could I accomplish this?

The SynonymGraphFilter API documentation contains the following statement
at the end:

"To get fully correct positional queries when your synonym replacements are
multiple tokens, you should instead apply synonyms using this TokenFilter
at query time and translate the resulting graph to a TermAutomatonQuery
e.g. using TokenStreamToTermAutomatonQuery."

How do I use TokenStreamtoTermAutomationQuery or can this not be configured
in Solr, but only by writing code against Lucene?  Would this even address
my issue?

I've found synonyms to be very frustrating in Solr and am hoping this new
filter will be a big improvement.  Thanks in advance for the help!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message