Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BE406200AE2 for ; Fri, 27 May 2016 18:28:40 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BD159160A12; Fri, 27 May 2016 16:28:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DAEA2160A10 for ; Fri, 27 May 2016 18:28:39 +0200 (CEST) Received: (qmail 89831 invoked by uid 500); 27 May 2016 16:28:38 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 89810 invoked by uid 99); 27 May 2016 16:28:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 May 2016 16:28:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6E453180547 for ; Fri, 27 May 2016 16:28:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.821 X-Spam-Level: X-Spam-Status: No, score=-0.821 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CZcRUuurBHim for ; Fri, 27 May 2016 16:28:35 +0000 (UTC) Received: from mail-qk0-f177.google.com (mail-qk0-f177.google.com [209.85.220.177]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id A721C5F4EC for ; Fri, 27 May 2016 16:28:34 +0000 (UTC) Received: by mail-qk0-f177.google.com with SMTP id x7so82973558qkd.3 for ; Fri, 27 May 2016 09:28:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lNB7nTAxuBaSIvmY5xK0UCU6lZi3jYAZt8u8nctcdxY=; b=rl6raJExtwKYf3vwcM7ELQKLOj+JD1DmrAI5WHk1OebMCpDkY+4Pe0eYYH0XwQO3OY 5PWCfXLsd2dNfQf8Iq9QaxWYILmbpjS1qamLM1MESwsCgIa3vSExvaOWhb0Nv8xmbNIh DmumCx6DR68KuEc8zR4/NCzewHTTQBXufhsBzc1RqO0Fkg3tjqilYVgex140jXBI759R AiJpBynorRkzXT7RbYx2lO86wUp1cOVTyBbNOKs86VGSN1SkivFpTkVMZovDNeM75Tv2 rZjnkbuqgu+hN9ZGqD8Ja+5TH1zFDSoMzC5rNBGEXbM+PX+76h8ZOROJoaxWk5IOqFYW mKWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=lNB7nTAxuBaSIvmY5xK0UCU6lZi3jYAZt8u8nctcdxY=; b=K6Gcrh+RMsBS1pdrendDzQS8hWc6K0pUkGF/LPuTCnrLIxD7JrtltH+ecQErhcG0D5 +QZkg1eJ88DsjP1zBrodjVvJtelvqQr2j61LJRQRJFCqfFooliu4cJoFQFIxxvX3Yxbx yqgh7pIep92kBnsDtVoTDjd/9GNTUYGrzkP5HWKx8h7G3xEN5h2foBqOess5gVTN118w PLF0CfuYoSq8JRw+oyrh4TUhZcKH4Bw5RKWc8xLs2E/5AwYYfyHlQZWhtLLu8XvoIww7 TJpLiLV+crSsA6Twwa1APfBAecmCwzowirducLaEcoT2x9R8134XARVZu2NJ4gCeFrkV hQtg== X-Gm-Message-State: ALyK8tJ2me8SzpbfU9laBFDQx4NLi7A+9/QiIiZ62vLLsuc9Ht8mS2dHb8qiyXUMromjIw== X-Received: by 10.55.137.68 with SMTP id l65mr14625843qkd.194.1464366507908; Fri, 27 May 2016 09:28:27 -0700 (PDT) Received: from [192.168.1.2] (cpe-24-59-39-136.twcny.res.rr.com. [24.59.39.136]) by smtp.gmail.com with ESMTPSA id 82sm22117qhu.10.2016.05.27.09.28.26 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 27 May 2016 09:28:27 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser From: Steve Rowe In-Reply-To: Date: Fri, 27 May 2016 12:28:26 -0400 Cc: chris@depahelix.com Content-Transfer-Encoding: quoted-printable Message-Id: References: <8795A443-5CCA-4E07-B02E-6A3644104611@whitepages.com> <902536a1fed24e379aebd8c096ff35e1@depahelix.com> To: solr-user@lucene.apache.org X-Mailer: Apple Mail (2.3124) archived-at: Fri, 27 May 2016 16:28:40 -0000 I=E2=80=99m working on addressing problems using multi-term synonyms at = query time in Lucene and Solr. I recommend these two blogs for understanding the issues (the second one = was mentioned earlier in this thread): = In addition to the already-mentioned projects, there is also: All of these projects try in various ways to work around the fact that = Lucene=E2=80=99s QueryParser splits on whitespace before sending text to = analysis, one token at a time, so in a synonym filter, multi-word = synonyms can never match and add alternatives. See = , where I=E2=80=99ve = posted a patch to directly address that problem - note that it=E2=80=99s = still a work in progress. Once LUCENE-2605 has been fixed, there is still work to do getting = (e)dismax to work with the modified Lucene QueryParser, and addressing = problems with how queries are constructed from Lucene=E2=80=99s = =E2=80=9Csausagized=E2=80=9D token stream. -- Steve www.lucidworks.com > On May 26, 2016, at 2:21 PM, John Bickerstaff = wrote: >=20 > Thanks Chris -- >=20 > The two projects I'm aware of are: >=20 > https://github.com/healthonnet/hon-lucene-synonyms >=20 > and the one referenced from the Lucidworks page here: > = https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in= -lucenesolr-using-the-auto-phrasing-tokenfilter/ >=20 > ... which is here : = https://github.com/LucidWorks/auto-phrase-tokenfilter >=20 > Is there anything else out there that you would recommend I look at? >=20 > On Thu, May 26, 2016 at 12:01 PM, Chris Morley = wrote: >=20 >> Chris Morley here, from Wayfair. (Depahelix =3D my domain) >>=20 >> Suyash Sonawane and I have worked on multiple word synonyms at = Wayfair. >> We worked mostly off of Ted Sullivan's work and also off of some >> suggestions from Koorosh Vakhshoori. We have gotten to a point where = we >> have a more sophisticated internal implementation, however, we've = found >> that it is very difficult to make it do what you want it to do, and = also be >> sufficiently performant. Watch out for exceptional situations with = mm >> (minimum should match). >>=20 >> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have = also >> done work in this area. >>=20 >> It should be very possible to get this kind of thing working on >> SolrCloud. I haven't tried it yet but I think theoretically, it = should >> just work. The synonyms stuff is mostly about doing things at index = time >> and query time. The index time stuff should translate to SolrCloud >> directly, while the query time stuff might pose some issues, but = probably >> not too bad, if there are any issues at all. >>=20 >> I've had decent luck porting our various plugins from 4.10.x to 5.5.0 >> because a lot of stuff is just Java, and it still works within the = Jetty >> context. >>=20 >> -Chris. >>=20 >>=20 >>=20 >>=20 >> ---------------------------------------- >> From: "John Bickerstaff" >> Sent: Thursday, May 26, 2016 1:51 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax = parser >> Hey Jeff (or anyone interested in multi-word synonyms) here are some >> potentially interesting links... >>=20 >> http://wiki.apache.org/solr/QueryParser (search the page for >> synonum_edismax) >>=20 >> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ = (blog >> post about what became the synonym_edissmax Query Parser) >>=20 >>=20 >> = https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in= -lucenesolr-using-the-auto-phrasing-tokenfilter/ >>=20 >> This last was useful for lots of reasons and contains links to other >> interesting, related web pages... >>=20 >> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes = >> wrote: >>=20 >>> Oh, interesting. I've certainty encountered issues with multi-word >>> synonyms, but I hadn't come across this. If you end up using it with = a >>> recent solr verison, I'd be glad to hear your experience. >>>=20 >>> I haven't used it, but I am aware of one other project in this vein = that >>> you might be interested in looking at: >>> https://github.com/LucidWorks/auto-phrase-tokenfilter >>>=20 >>>=20 >>> On 5/26/16, 9:29 AM, "John Bickerstaff" >> wrote: >>>=20 >>>> Ahh - for question #3 I may have spoken too soon. This line from = the >>>> github repository readme suggests a way. >>>>=20 >>>> Update: We have tested to run with the jar in $SOLR_HOME/lib as = well, >> and >>>> it works (Jetty). >>>>=20 >>>> I'll try that and only respond back if that doesn't work. >>>>=20 >>>> Questions 1 and 2 still stand of course... If anyone on the list = has >>>> experience in this area... >>>>=20 >>>> Thanks. >>>>=20 >>>> On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff < >>> john@johnbickerstaff.com >>>>> wrote: >>>>=20 >>>>> Hi all, >>>>>=20 >>>>> I'm creating a Solr Cloud that will index and search medical text. >>>>> Multi-word synonyms are a pretty important factor. >>>>>=20 >>>>> I find that there are some challenges around multi-word synonyms = and I >>>>> also found on the wiki that there is a recommended 3rd-party = parser >>>>> (synonym_edismax parser) created by Nolan Lawson and found here: >>>>> https://github.com/healthonnet/hon-lucene-synonyms >>>>>=20 >>>>> Here's the thing - the instructions on the github site involve >> bringing >>>>> the jar file into the war file - which is not applicable any = more... >> at >>>>> least I think it's not... >>>>>=20 >>>>> I have three questions: >>>>>=20 >>>>> 1. Is this still a good solution for multi-word synonyms (I.e. = Solr >>> Cloud >>>>> doesn't break it in some way) >>>>> 2. Is there a tool or plug-in out there that the contributors = would >>>>> recommend above this one? >>>>> 3. Assuming 1 =3D yes and 2 =3D no, can anyone tell me an updated >> procedure >>>>> for bringing it in to Solr Cloud (I'm running 5.4.x) >>>>>=20 >>>>> Thanks >>>>>=20 >>>=20 >>>=20 >>=20 >>=20 >>=20