Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 17DFC200C14 for ; Tue, 7 Feb 2017 13:49:23 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1639D160B4B; Tue, 7 Feb 2017 12:49:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6149E160B3E for ; Tue, 7 Feb 2017 13:49:22 +0100 (CET) Received: (qmail 22844 invoked by uid 500); 7 Feb 2017 12:49:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 22828 invoked by uid 99); 7 Feb 2017 12:49:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Feb 2017 12:49:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 72FFD1A07E7 for ; Tue, 7 Feb 2017 12:49:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.5 X-Spam-Level: X-Spam-Status: No, score=-4.5 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id GuL4WiAC0NHA for ; Tue, 7 Feb 2017 12:49:17 +0000 (UTC) Received: from unibi-smtp-a.hrz.uni-bielefeld.de (unibi-smtp-a.hrz.uni-bielefeld.de [129.70.208.12]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 7262D5F570 for ; Tue, 7 Feb 2017 12:49:17 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 8BIT Content-type: text/plain; charset=utf-8 Received: from [129.70.11.68] ([129.70.11.68]) by unibi-smtp-a.hrz.uni-bielefeld.de (Oracle Communications Messaging Server 7.0.5.37.0 64bit (built Jan 25 2016)) with ESMTPPA id <0OL000ISK8STO540@unibi-smtp-a.hrz.uni-bielefeld.de> for java-user@lucene.apache.org; Tue, 07 Feb 2017 13:46:05 +0100 (CET) X-Connecting-IP: [129.70.11.68] X-PMX-Version: 6.3.1.2588712, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2017.2.7.123919, pmx12 X-EnvFrom: bernd.fehling@uni-bielefeld.de Subject: Re: SynonymFilterFactory deprecated since 6.4.0 To: java-user@lucene.apache.org References: <5a1c6576-319d-01be-b089-cd93dce5c2e1@uni-bielefeld.de> <15381_1486467299_v17BYvhS006975_CAL8Pwkb6MW5CQe8_-o-JGpUGOr1Z_s54-w2iHuXHCRf+aUA7yA@mail.gmail.com> From: Bernd Fehling Message-id: Date: Tue, 07 Feb 2017 13:46:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 In-reply-to: <15381_1486467299_v17BYvhS006975_CAL8Pwkb6MW5CQe8_-o-JGpUGOr1Z_s54-w2iHuXHCRf+aUA7yA@mail.gmail.com> archived-at: Tue, 07 Feb 2017 12:49:23 -0000 Years ago (2007) I've installed Eurovoc Thesaurus to work with our Search Engine as multilingual search (terms and phrases in 22 languages). http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html The synonyms.txt file is 8.8MB in size and gets as FST over 300.000 mappings as n-to-m due to permutation. You can get from a single term/token several single and multi-word synonyms and from multi-word terms/tokens also single and multi-word synonyms. Position increment and position length is handled correct. And the originating search term with their direct synonyms is/can be boosted. I will look into SynonymGraphFilter and FlattenGraphFilter to see how it compares to my development. Regards Bernd Am 07.02.2017 um 12:34 schrieb Michael McCandless: > That's great that multi-token synonyms are working for you; can you > describe how use them? > > This blog post describes some of the problems: > http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html > > I'm working on another blog post to describe the recent changes ... > should be out in maybe a week or so. > > Anyway, to just keep doing what you are doing today, you should switch > to SynonymGraphFilter followed by FlattenGraphFilter: it will make the > same tokens as the current SynonymFilter, but will necessarily be > buggy in the multi-token case. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Feb 7, 2017 at 6:07 AM, Bernd Fehling > wrote: >> I just tried Solr 6.4.1 and noticed that SynonymFilterFactory is >> deprecated, as reported in the logs. >> >> I hope that this is just to note that there is also an alternative >> SynonymGraphFilterFactory now available. >> >> And _not_ that SynonymFilterFactory will disappear, because it runs my >> multi-word Synonyms Thesaurus now for years like a charme. >> I hate to reinvent the wheel. >> >> Regards >> Bernd >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -- ************************************************************* Bernd Fehling Bielefeld University Library Dipl.-Inform. (FH) LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net ************************************************************* --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org