lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincenzo D'Amore" <v.dam...@gmail.com>
Subject Re: SynonymFilterFactory deprecated
Date Tue, 07 Nov 2017 21:43:48 GMT
Hi Mike,

thanks for suggesting this very interesting post. I've tried going deeper
reading also:

https://issues.apache.org/jira/browse/LUCENE-6664
https://www.elastic.co/blog/multitoken-synonyms-and-graph-qu
eries-in-elasticsearch
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-a
dds-query-time-support/

Not clear if and how I can have multiple SynonymGraphFilter in the same
chain.

Anyway, I've tried starting a new brand solr 7.1.0 instance and modifying
the "text_general" fieldType:

  <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms_2.txt"/>
      <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms_2.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

This seems to return the expected result, as said earlier not sure if there
are counter-indications.

Best regards,
Vincenzo




On Wed, Oct 25, 2017 at 12:50 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You need only one FlattenGraphFilter at the end of your analysis chain.
>
> But note that neither SynonymGraphFilter nor SynonymFilter can consume a
> graph as input; so multiple SynonymGraphFilters will not work.
>
> http://blog.mikemccandless.com/2012/04/lucenes-
> tokenstreams-are-actually.html gives some insight into why synonym
> filters create graphs, but it was written before SynonymGraphFilter and
> FlattenGraphFilter.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Oct 25, 2017 at 5:04 AM, Vincenzo D'Amore <v.damore@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I see in Solr SynonymFilterFactory is deprecated
>>
>> https://lucene.apache.org/core/7_1_0/analyzers-common/
>> org/apache/lucene/analysis/synonym/SynonymFilterFactory.html
>>
>> the documentation suggest:
>>
>> Use SynonymGraphFilterFactory
>> > <https://lucene.apache.org/core/7_1_0/analyzers-common/org/
>> apache/lucene/analysis/synonym/SynonymGraphFilterFactory.html>
>> >  instead, but be sure to also use FlattenGraphFilterFactory
>> > <https://lucene.apache.org/core/7_1_0/analyzers-common/org/
>> apache/lucene/analysis/core/FlattenGraphFilterFactory.html>
>> >  at index time (not at search time) as well.
>>
>>
>> On the other hand documentation also say FlattenGraphFilterFactory is
>> experimental and might change in incompatible ways in the next release.
>>
>> Not sure what to do in this case. Not clear what does
>> FlattenGraphFilterFactory and why should I have it after the
>> SynonymGraphFilterFactory.
>>
>> And again, if I have many SynonymGraphFilterFactory at index time, may I
>> have only one FlattenGraphFilterFactory at end of chain or should I add a
>> FlattenGraphFilterFactory for each SynonymGraphFilterFactory found in the
>> chain?
>>
>> Thanks for your time and best regards,
>> Vincenzo
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message