Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80696181B4 for ; Fri, 29 May 2015 19:52:43 +0000 (UTC) Received: (qmail 40473 invoked by uid 500); 29 May 2015 19:52:39 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 40407 invoked by uid 500); 29 May 2015 19:52:39 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 40395 invoked by uid 99); 29 May 2015 19:52:38 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 May 2015 19:52:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1F4D3182399 for ; Fri, 29 May 2015 19:52:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.651 X-Spam-Level: ** X-Spam-Status: No, score=2.651 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, KAM_ASCII_DIVIDERS=0.8, KAM_COUK=1.1, KAM_INFOUSMEBIZ=0.75, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=messagingengine.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 72WsY1dj5hd8 for ; Fri, 29 May 2015 19:52:23 +0000 (UTC) Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 49B3C20696 for ; Fri, 29 May 2015 19:52:23 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id BD25D2092B for ; Fri, 29 May 2015 15:48:07 -0400 (EDT) Received: from web5 ([10.202.2.215]) by compute5.internal (MEProxy); Fri, 29 May 2015 15:48:07 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=v8/0JfZqbAtBVPQ m2HGewZotoQo=; b=iCwEqoSusU1yga3B6mgRqFsjVS+3GyVWmkIMv1yyvRau8Kh qRB0pnfsK3LmCEu4u/ZrJYVDAW/3k9ip0D5tMuLLmtE7BxOTSzciReOii/OGJq6v f2NwrEBkG9Td/XTCDwSYdpf+KuXtjJu5LVzFKTJaBMDxczhBsRBBhj4Jynoo= Received: by web5.nyi.internal (Postfix, from userid 99) id 89208A73745; Fri, 29 May 2015 15:48:07 -0400 (EDT) Message-Id: <1432928887.443455.281744505.77A1DD1F@webmail.messagingengine.com> X-Sasl-Enc: 4eoXSeVRTj/EV3IGF3e4onTjwnlnNcoCgkR4vs50NugW 1432928887 From: Upayavira To: solr-user@lucene.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-073992ec In-Reply-To: References: <1432643558.2561023.278264121.51A2AF14@webmail.messagingengine.com> Subject: Re: docValues: Can we apply synonym Date: Fri, 29 May 2015 20:48:07 +0100 Use copyField to clone the field for faceting purposes. Upayavira On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote: > Hi Erick, >=20 > Thanks for suggestion, We are this query parser plugin ( > *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word > synonym. So it does work slower than edismax that's why it is not in > contrib right? (I am asking this question because we are using for all > our > searches to handle 10 multiword ice cube, icecube etc) >=20 > *Moreover I thought a solution for this docValue problem* >=20 > I need to make city field as *multivalued* and by this I mean i will add > the synonym (*mumbai, bombay*) as an extra value to that field if > present. > Now searching operation will work fine as before. >=20 > > > > *mumbaibombay* >=20 >=20 > The only prob is if we have to remove the 'city alias/synonym facets' > when > we are providing results to the clients. >=20 > *mumbai, 1000* >=20 >=20 > With Regards > Aman Tandon >=20 > On Fri, May 29, 2015 at 7:26 PM, Erick Erickson > wrote: >=20 > > Do take time for performance testing with that parser. It can be slow > > depending on your > > data as I remember. That said it solves the problem it set out to > > solve so if it meets > > your SLAs, it can be a life-saver. > > > > Best, > > Erick > > > > > > On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti > > wrote: > > > Even if a little bit outdated, that query parser is really really coo= l to > > > manage synonyms ! > > > +1 ! > > > > > > 2015-05-29 1:01 GMT+01:00 Aman Tandon : > > > > > >> Thanks chris. > > >> > > >> Yes we are using it for handling multiword synonym problem. > > >> > > >> With Regards > > >> Aman Tandon > > >> > > >> On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles < > > >> Charles.Reitzel@tiaa-cref.org> wrote: > > >> > > >> > Again, I would recommend using Nolan Lawson's > > >> > SynonymExpandingExtendedDismaxQParserPlugin. > > >> > > > >> > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ > > >> > > > >> > -----Original Message----- > > >> > From: Aman Tandon [mailto:amantandon.10@gmail.com] > > >> > Sent: Wednesday, May 27, 2015 6:42 PM > > >> > To: solr-user@lucene.apache.org > > >> > Subject: Re: docValues: Can we apply synonym > > >> > > > >> > Ok and what synonym processor you is talking about maybe it could > > help ? > > >> > > > >> > With Regards > > >> > Aman Tandon > > >> > > > >> > On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles < > > >> > Charles.Reitzel@tiaa-cref.org> wrote: > > >> > > > >> > > Sorry, my bad. The synonym processor I mention works different= ly. > > >> It's > > >> > > an extension of the EDisMax query processor and doesn't require > > field > > >> > > level synonym configs. > > >> > > > > >> > > -----Original Message----- > > >> > > From: Reitzel, Charles [mailto:Charles.Reitzel@tiaa-cref.org] > > >> > > Sent: Wednesday, May 27, 2015 6:12 PM > > >> > > To: solr-user@lucene.apache.org > > >> > > Subject: RE: docValues: Can we apply synonym > > >> > > > > >> > > But the query analysis isn't on a specific field, it is applied = to > > the > > >> > > query string. > > >> > > > > >> > > -----Original Message----- > > >> > > From: Aman Tandon [mailto:amantandon.10@gmail.com] > > >> > > Sent: Wednesday, May 27, 2015 6:08 PM > > >> > > To: solr-user@lucene.apache.org > > >> > > Subject: Re: docValues: Can we apply synonym > > >> > > > > >> > > Hi Charles, > > >> > > > > >> > > The problem here is that the docValues works only with primitives > > data > > >> > > type only like String, int, etc So how could we apply synonym on > > >> > > primitive data type. > > >> > > > > >> > > With Regards > > >> > > Aman Tandon > > >> > > > > >> > > On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles < > > >> > > Charles.Reitzel@tiaa-cref.org> wrote: > > >> > > > > >> > > > Is there any reason you cannot apply the synonyms at query tim= e? > > >> > > > Applying synonyms at indexing time has problems, e.g. polluti= ng > > the > > >> > > > term frequency for synonyms added, preventing distance queries, > > ... > > >> > > > > > >> > > > Since city names often have multiple terms, e.g. New York, Den > > >> > > > Hague, etc., I would recommend using Nolan Lawson's > > >> > > > SynonymExpandingExtendedDismaxQParserPlugin. Tastes great, l= ess > > >> > > filling. > > >> > > > > > >> > > > > > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ > > >> > > > > > >> > > > We found this to fix synonyms like "ny" for "New York" and vice > > >> versa. > > >> > > > Haven't tried it with docValues, tho. > > >> > > > > > >> > > > -----Original Message----- > > >> > > > From: Aman Tandon [mailto:amantandon.10@gmail.com] > > >> > > > Sent: Tuesday, May 26, 2015 11:15 PM > > >> > > > To: solr-user@lucene.apache.org > > >> > > > Subject: Re: docValues: Can we apply synonym > > >> > > > > > >> > > > Yes it could be :) > > >> > > > > > >> > > > Anyway thanks for helping. > > >> > > > > > >> > > > With Regards > > >> > > > Aman Tandon > > >> > > > > > >> > > > On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti < > > >> > > > benedetti.alex85@gmail.com> wrote: > > >> > > > > > >> > > > > I should investigate that, as usually synonyms are analysis > > stage. > > >> > > > > A simple way is to replace the word with all its synonyms ( > > >> > > > > including original word), but simply using this kind of > > processor > > >> > > > > will change the token position and offsets, modifying the ac= tual > > >> > > > > content of the > > >> > > > document . > > >> > > > > > > >> > > > > " I am from Bombay" will become " I am from Bombay Mumbai" w= hich > > >> > > > > can be annoying. > > >> > > > > So a clever approach must be investigated. > > >> > > > > > > >> > > > > 2015-05-26 17:36 GMT+01:00 Aman Tandon > >: > > >> > > > > > > >> > > > > > Okay So how could I do it with UpdateProcessors? > > >> > > > > > > > >> > > > > > With Regards > > >> > > > > > Aman Tandon > > >> > > > > > > > >> > > > > > On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti < > > >> > > > > > benedetti.alex85@gmail.com> wrote: > > >> > > > > > > > >> > > > > > > mmm this is different ! > > >> > > > > > > Without any customisation, right now you could : > > >> > > > > > > - use docValues to provide exact value facets. > > >> > > > > > > - Than you can use a copy field, with the proper analysi= s, > > to > > >> > > > > > > search > > >> > > > > > when a > > >> > > > > > > user click on a filter ! > > >> > > > > > > > > >> > > > > > > So you will see in your facets : > > >> > > > > > > Mumbai(3) > > >> > > > > > > Bombay(2) > > >> > > > > > > > > >> > > > > > > And when clicking you see 5 results. > > >> > > > > > > A little bit misleading for the users =E2=80=A6 > > >> > > > > > > > > >> > > > > > > On the other hand if you you want to apply the synonyms > > >> > > > > > > before, the indexing pipeline ( because docValues field = can > > >> > > > > > > not be analysed), I > > >> > > > > think > > >> > > > > > > you should play with UpdateProcessors. > > >> > > > > > > > > >> > > > > > > Cheers > > >> > > > > > > > > >> > > > > > > 2015-05-26 17:18 GMT+01:00 Aman Tandon < > > >> amantandon.10@gmail.com > > >> > >: > > >> > > > > > > > > >> > > > > > > > We are interested in using docValues for better memory > > >> > > > > > > > utilization > > >> > > > > and > > >> > > > > > > > speed. > > >> > > > > > > > > > >> > > > > > > > Currently we are faceting the search results on *city.= *In > > >> > > > > > > > city we > > >> > > > > have > > >> > > > > > > > also added the synonym for cities like mumbai, bombay > > (These > > >> > > > > > > > are > > >> > > > > Indian > > >> > > > > > > > cities). So that result of mumbai is also eligible when > > >> > > > > > > > somebody will applying filter of bombay on search resu= lts. > > >> > > > > > > > > > >> > > > > > > > I need this functionality to apply with docValues enab= led > > >> > field. > > >> > > > > > > > > > >> > > > > > > > With Regards > > >> > > > > > > > Aman Tandon > > >> > > > > > > > > > >> > > > > > > > On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti < > > >> > > > > > > > benedetti.alex85@gmail.com> wrote: > > >> > > > > > > > > > >> > > > > > > > > I checked in the Documentation to be sure, but > > apparently : > > >> > > > > > > > > > > >> > > > > > > > > DocValues are only available for specific field type= s. > > The > > >> > > > > > > > > types > > >> > > > > > chosen > > >> > > > > > > > > determine the underlying Lucene docValue type that w= ill > > be > > >> > > used. > > >> > > > > The > > >> > > > > > > > > available Solr field types are: > > >> > > > > > > > > > > >> > > > > > > > > - StrField and UUIDField. > > >> > > > > > > > > - If the field is single-valued (i.e., multi-valu= ed > > is > > >> > > > > > > > > false), > > >> > > > > > > Lucene > > >> > > > > > > > > will use the SORTED type. > > >> > > > > > > > > - If the field is multi-valued, Lucene will use > > the > > >> > > > > SORTED_SET > > >> > > > > > > > type. > > >> > > > > > > > > - Any Trie* numeric fields and EnumField. > > >> > > > > > > > > - If the field is single-valued (i.e., multi-valu= ed > > is > > >> > > > > > > > > false), > > >> > > > > > > Lucene > > >> > > > > > > > > will use the NUMERIC type. > > >> > > > > > > > > - If the field is multi-valued, Lucene will use > > the > > >> > > > > SORTED_SET > > >> > > > > > > > type. > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > This means you should not analyse a field where > > DocValues > > >> > > > > > > > > is > > >> > > > > enabled. > > >> > > > > > > > > Can your explain us your use case ? Why are you > > interested > > >> > > > > > > > > in > > >> > > > > > synonyms > > >> > > > > > > > > DocValues level ? > > >> > > > > > > > > > > >> > > > > > > > > Cheers > > >> > > > > > > > > > > >> > > > > > > > > 2015-05-26 13:32 GMT+01:00 Upayavira : > > >> > > > > > > > > > > >> > > > > > > > > > To my understanding, docValues are just an uninver= ted > > >> > index. > > >> > > > > > > > > > That > > >> > > > > > is, > > >> > > > > > > > it > > >> > > > > > > > > > contains the terms that are generated at the end o= f an > > >> > > > > > > > > > analysis > > >> > > > > > > chain. > > >> > > > > > > > > > Therefore, you simply enable docValues and include= the > > >> > > > > > > > > > SynonymFilterFactory in your analysis. > > >> > > > > > > > > > > > >> > > > > > > > > > Is that enough, or are you struggling with some ot= her > > >> > issue? > > >> > > > > > > > > > > > >> > > > > > > > > > Upayavira > > >> > > > > > > > > > > > >> > > > > > > > > > On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wro= te: > > >> > > > > > > > > > > Hi, > > >> > > > > > > > > > > > > >> > > > > > > > > > > We have some field *city* in which the docValues= are > > >> > > enabled. > > >> > > > > We > > >> > > > > > > need > > >> > > > > > > > > to > > >> > > > > > > > > > > add the synonym in that field so how could we do= it? > > >> > > > > > > > > > > > > >> > > > > > > > > > > With Regards > > >> > > > > > > > > > > Aman Tandon > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > -- > > >> > > > > > > > > -------------------------- > > >> > > > > > > > > > > >> > > > > > > > > Benedetti Alessandro > > >> > > > > > > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > > > > > > >> > > > > > > > > "Tyger, tyger burning bright In the forests of the > > night, > > >> > > > > > > > > What immortal hand or eye Could frame thy fearful > > >> symmetry?" > > >> > > > > > > > > > > >> > > > > > > > > William Blake - Songs of Experience -1794 England > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > -- > > >> > > > > > > -------------------------- > > >> > > > > > > > > >> > > > > > > Benedetti Alessandro > > >> > > > > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > > > > >> > > > > > > "Tyger, tyger burning bright > > >> > > > > > > In the forests of the night, > > >> > > > > > > What immortal hand or eye > > >> > > > > > > Could frame thy fearful symmetry?" > > >> > > > > > > > > >> > > > > > > William Blake - Songs of Experience -1794 England > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > -- > > >> > > > > -------------------------- > > >> > > > > > > >> > > > > Benedetti Alessandro > > >> > > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > > >> > > > > "Tyger, tyger burning bright > > >> > > > > In the forests of the night, > > >> > > > > What immortal hand or eye > > >> > > > > Could frame thy fearful symmetry?" > > >> > > > > > > >> > > > > William Blake - Songs of Experience -1794 England > > >> > > > > > > >> > > > > > >> > > > > > ******************************************************************** > > >> > > > ** > > >> > > > *** This e-mail may contain confidential or privileged > > information. > > >> > > > If you are not the intended recipient, please notify the sender > > >> > > > immediately and then delete it. > > >> > > > > > >> > > > TIAA-CREF > > >> > > > > > ******************************************************************** > > >> > > > ** > > >> > > > *** > > >> > > > > > >> > > > > >> > > > > ********************************************************************** > > >> > > *** This e-mail may contain confidential or privileged informati= on. > > >> > > If you are not the intended recipient, please notify the sender > > >> > > immediately and then delete it. > > >> > > > > >> > > TIAA-CREF > > >> > > > > ********************************************************************** > > >> > > *** > > >> > > > > >> > > > > ********************************************************************** > > >> > > *** This e-mail may contain confidential or privileged informati= on. > > >> > > If you are not the intended recipient, please notify the sender > > >> > > immediately and then delete it. > > >> > > > > >> > > TIAA-CREF > > >> > > > > ********************************************************************** > > >> > > *** > > >> > > > > >> > > > >> > > > ***********************************************************************= ** > > >> > This e-mail may contain confidential or privileged information. > > >> > If you are not the intended recipient, please notify the sender > > >> > immediately and then delete it. > > >> > > > >> > TIAA-CREF > > >> > > > ***********************************************************************= ** > > >> > > > >> > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > >