Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E8E621005B for ; Thu, 16 Jan 2014 14:14:09 +0000 (UTC) Received: (qmail 69267 invoked by uid 500); 16 Jan 2014 14:14:03 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 69192 invoked by uid 500); 16 Jan 2014 14:14:02 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 68968 invoked by uid 99); 16 Jan 2014 14:13:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 14:13:53 +0000 X-ASF-Spam-Status: No, hits=1.3 required=5.0 tests=MIME_QP_LONG_LINE,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jlbetancourt@uci.cu designates 200.55.140.180 as permitted sender) Received: from [200.55.140.180] (HELO mx3.uci.cu) (200.55.140.180) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 16 Jan 2014 14:13:50 +0000 Received: (qmail 10785 invoked from network); 16 Jan 2014 14:13:27 -0000 Received: from unknown (HELO wmail3.uci.cu) (10.0.0.86) by 0 with SMTP; 16 Jan 2014 14:13:27 -0000 Received: from localhost (localhost [127.0.0.1]) by wmail3.uci.cu (Postfix) with ESMTP id 173ED6A2206 for ; Thu, 16 Jan 2014 09:13:27 -0500 (CST) Received: from wmail3.uci.cu ([127.0.0.1]) by localhost (wmail3.uci.cu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id DPQFPwNPR1Ih; Thu, 16 Jan 2014 09:13:12 -0500 (CST) Received: from localhost (localhost [127.0.0.1]) by wmail3.uci.cu (Postfix) with ESMTP id 30C746A2149 for ; Thu, 16 Jan 2014 09:13:12 -0500 (CST) X-Amavis-Modified: Mail body modified (using disclaimer) - wmail3.uci.cu X-Virus-Scanned: amavisd-new at uci.cu Received: from wmail3.uci.cu ([127.0.0.1]) by localhost (wmail3.uci.cu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SHJZBNwh2qe9 for ; Thu, 16 Jan 2014 09:13:12 -0500 (CST) Received: from wmail3.uci.cu (wmail3.uci.cu [10.0.0.86]) by wmail3.uci.cu (Postfix) with ESMTP id 164996A2211 for ; Thu, 16 Jan 2014 09:13:12 -0500 (CST) Date: Thu, 16 Jan 2014 09:13:11 -0500 (CST) From: Jorge Luis Betancourt =?utf-8?Q?Gonz=C3=A1lez?= To: solr-user@lucene.apache.org Message-ID: <1401432695.2324829.1389881591848.JavaMail.zimbra@uci.cu> In-Reply-To: References: Subject: Re: Search Suggestion Filtering MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.58.16.196] X-Mailer: Zimbra 8.0.5_GA_5839 (ZimbraWebClient - GC31 (Mac)/8.0.5_GA_5839) Thread-Topic: Search Suggestion Filtering Thread-Index: 4AR88EG6rOwEeaRD7nnfNvppQdaVLw== X-Virus-Checked: Checked by ClamAV on apache.org In a custom application we have, we use a separated core (under Solr 3.6.1)= to store the queries used by the users and then provide the autocomplete f= eauture. In our case we need to filter some phrases, that we don't need to = be suggested to the users. I build a custom UpdateRequestProcessor to imple= ment this logic, so we define this "blocking patterns" in some external sou= rce of information (DB, files, etc.). For the suggestions per-se we use as = a base https://github.com/cominvent/autocomplete=E2=80=8E configuration, de= scribed in www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-so= lr/=E2=80=8E which is pretty usable as it comes. I found (personally) this = approach way more flexible than the original suggester component, but it in= volves storing the user's queries into a separated core. Greetings, ----- Original Message ----- From: "Hamish Campbell" To: solr-user@lucene.apache.org Sent: Wednesday, January 15, 2014 9:10:16 PM Subject: Re: Search Suggestion Filtering Thanks Tom=C3=A1s, I'll take a look. Still interested to hear from anyone about using queries to populate the list - I'm willing to give up a bit of performance for the flexibility it would provide. On Thu, Jan 16, 2014 at 1:06 PM, Tom=C3=A1s Fern=C3=A1ndez L=C3=B6bbe < tomasflobbe@gmail.com> wrote: > I think your use case is the one described in LUCENE-5350, maybe you want > to take a look to the patch and comments there. > > Tom=C3=A1s > > > On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell < > hamish.campbell@koordinates.com> wrote: > > > Hi all, > > > > I'm looking into options for filtering the search suggestions dictionar= y. > > > > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using = a > > field based dictionary, we're indexing records for a multi-tenanted Saa= S > > platform. SearchHandler records are always filtered by the particular > > client warehouse (e.g. by domain), however we need a way to apply a > similar > > filter to the spell check dictionary to prevent leaking terms between > > clients. In other words: when client A searches for a document title th= ey > > should not receive spelling suggestions for client B's document titles. > > > > This has been asked a couple of times, on the mailing list and on > > StackOverflow. Some of the suggested approaches: > > > > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned > here: > > > > > http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069= 627.html > > ) > > > > That might be a reasonable option for us (we already considered a simil= ar > > approach), but at what point does this stop scaling efficiently? How ma= ny > > dynamic fields are too many? > > > > 2. Run a query to populate the suggestion list (also mentioned in that > > thread) > > > > If I understand this correctly, this would give us a lot of flexibility > and > > power: for example to give a more nuanced result set using the users > > permissions to expose private documents in their spelling suggestions. > > > > I expect this would be a slow query, but our total document count is > > currently relatively small (on the order of 10^3 objects) and I imagine > you > > could create a specific word index with the appropriate fields to keep > this > > in check. Is this a feasible approach, and if so, how do you build a > > dynamic suggestion list? > > > > 3. Other options: > > > > It seems like this is a common problem - and we could through some > > resources at building an extension to provide some limited suggestion > > dictionary filtering. Is anyone already doing something similar, or has > > found a clever hack around this, or can suggest a starting point? > > > > Thanks everyone! > > > > -- > > Hamish Campbell > > Koordinates Ltd > > PH +64 9 966 0433 > > FAX +64 9 966 0045 > > > --=20 Hamish Campbell Koordinates Ltd PH +64 9 966 0433 FAX +64 9 966 0045 ________________________________________________________________________________________________ III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu