lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: SynonymFilterFactory deprecated since 6.4.0
Date Thu, 09 Feb 2017 07:40:46 GMT
I tried SynonymGraphFilter with my setup and it works right away.
It payed of that I did some modifications on my filters while
testing 6.3 with my setup.

I only replaced SynonymFilter with SynonymGraphFilter and did not
use FlattenGraphFilter, pretty simple. So I can confirm that, up
to this point, SynonymGraphFilter is a full replacement for
SynonymFilter. At least for search-time synonym handling.

But this also means there is still some work with the attributes, right?
Position looks good, type and start are no problem anyway, but
the end position is still wrong and the positionLength for multi-word
synonyms.

One thing I noticed was that the originating token which "produces"
synonyms comes out last from SynonymGraphFilter, after the
"produced" synonyms.
I will have a look inside with debugger but I guess this is due
to output buffering of SynonymGraphFilter?

Regards
Bernd


Am 07.02.2017 um 19:31 schrieb Michael McCandless:
> Thanks for sharing; it looks like a nice set of synonyms!
> 
> It's good that you already apply them at search-time not index-time.
> 
> In that case, you should not use the FlattenGraphFilter, because
> SynonymGraphFilter will produce a correct graph (unlike SynonymFilter)
> and the Lucene query parsers (not sure about Solr's query parser fork)
> will correctly detect the graph and create the right query.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Tue, Feb 7, 2017 at 7:46 AM, Bernd Fehling
> <bernd.fehling@uni-bielefeld.de> wrote:
>> Years ago (2007) I've installed Eurovoc Thesaurus to work with our
>> Search Engine as multilingual search (terms and phrases in 22 languages).
>>
>> http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html
>>
>> The synonyms.txt file is 8.8MB in size and gets as FST over 300.000 mappings
>> as n-to-m due to permutation.
>> You can get from a single term/token several single and multi-word synonyms
>> and from multi-word terms/tokens also single and multi-word synonyms.
>> Position increment and position length is handled correct.
>> And the originating search term with their direct synonyms is/can be boosted.
>>
>> I will look into SynonymGraphFilter and FlattenGraphFilter to see how it
>> compares to my development.
>>
>> Regards
>> Bernd
>>
>>
>> Am 07.02.2017 um 12:34 schrieb Michael McCandless:
>>> That's great that multi-token synonyms are working for you; can you
>>> describe how use them?
>>>
>>> This blog post describes some of the problems:
>>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>>>
>>> I'm working on another blog post to describe the recent changes ...
>>> should be out in maybe a week or so.
>>>
>>> Anyway, to just keep doing what you are doing today, you should switch
>>> to SynonymGraphFilter followed by FlattenGraphFilter: it will make the
>>> same tokens as the current SynonymFilter, but will necessarily be
>>> buggy in the multi-token case.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Tue, Feb 7, 2017 at 6:07 AM, Bernd Fehling
>>> <bernd.fehling@uni-bielefeld.de> wrote:
>>>> I just tried Solr 6.4.1 and noticed that SynonymFilterFactory is
>>>> deprecated, as reported in the logs.
>>>>
>>>> I hope that this is just to note that there is also an alternative
>>>> SynonymGraphFilterFactory now available.
>>>>
>>>> And _not_ that SynonymFilterFactory will disappear, because it runs my
>>>> multi-word Synonyms Thesaurus now for years like a charme.
>>>> I hate to reinvent the wheel.
>>>>
>>>> Regards
>>>> Bernd
>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message