lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: SynonymFilterFactory deprecated since 6.4.0
Date Tue, 07 Feb 2017 12:46:05 GMT
Years ago (2007) I've installed Eurovoc Thesaurus to work with our
Search Engine as multilingual search (terms and phrases in 22 languages).

http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html

The synonyms.txt file is 8.8MB in size and gets as FST over 300.000 mappings
as n-to-m due to permutation.
You can get from a single term/token several single and multi-word synonyms
and from multi-word terms/tokens also single and multi-word synonyms.
Position increment and position length is handled correct.
And the originating search term with their direct synonyms is/can be boosted.

I will look into SynonymGraphFilter and FlattenGraphFilter to see how it
compares to my development.

Regards
Bernd


Am 07.02.2017 um 12:34 schrieb Michael McCandless:
> That's great that multi-token synonyms are working for you; can you
> describe how use them?
> 
> This blog post describes some of the problems:
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> 
> I'm working on another blog post to describe the recent changes ...
> should be out in maybe a week or so.
> 
> Anyway, to just keep doing what you are doing today, you should switch
> to SynonymGraphFilter followed by FlattenGraphFilter: it will make the
> same tokens as the current SynonymFilter, but will necessarily be
> buggy in the multi-token case.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Tue, Feb 7, 2017 at 6:07 AM, Bernd Fehling
> <bernd.fehling@uni-bielefeld.de> wrote:
>> I just tried Solr 6.4.1 and noticed that SynonymFilterFactory is
>> deprecated, as reported in the logs.
>>
>> I hope that this is just to note that there is also an alternative
>> SynonymGraphFilterFactory now available.
>>
>> And _not_ that SynonymFilterFactory will disappear, because it runs my
>> multi-word Synonyms Thesaurus now for years like a charme.
>> I hate to reinvent the wheel.
>>
>> Regards
>> Bernd
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

-- 
*************************************************************
Bernd Fehling                    Bielefeld University Library
Dipl.-Inform. (FH)                LibTec - Library Technology
Universitätsstr. 25                  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message