lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: docValues: Can we apply synonym
Date Fri, 29 May 2015 19:48:07 GMT
Use copyField to clone the field for faceting purposes.

Upayavira

On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
> Hi Erick,
> 
> Thanks for suggestion, We are this query parser plugin (
> *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
> synonym. So it does work slower than edismax that's why it is not in
> contrib right? (I am asking this question because we are using for all
> our
> searches to handle 10 multiword ice cube, icecube etc)
> 
> *Moreover I thought a solution for this docValue problem*
> 
> I need to make city field as *multivalued* and by this I mean i will add
> the synonym (*mumbai, bombay*) as an extra value to that field if
> present.
> Now searching operation will work fine as before.
> 
> >
> > *<field name="city">mumbai</field><field name="city">bombay</field>*
> 
> 
> The only prob is if we have to remove the 'city alias/synonym facets'
> when
> we are providing results to the clients.
> 
> *mumbai, 1000*
> 
> 
> With Regards
> Aman Tandon
> 
> On Fri, May 29, 2015 at 7:26 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
> > Do take time for performance testing with that parser. It can be slow
> > depending on your
> > data as I remember. That said it solves the problem it set out to
> > solve so if it meets
> > your SLAs, it can be a life-saver.
> >
> > Best,
> > Erick
> >
> >
> > On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
> > <benedetti.alex85@gmail.com> wrote:
> > > Even if a little bit outdated, that query parser is really really cool to
> > > manage synonyms !
> > > +1 !
> > >
> > > 2015-05-29 1:01 GMT+01:00 Aman Tandon <amantandon.10@gmail.com>:
> > >
> > >> Thanks chris.
> > >>
> > >> Yes we are using it for handling multiword synonym problem.
> > >>
> > >> With Regards
> > >> Aman Tandon
> > >>
> > >> On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles <
> > >> Charles.Reitzel@tiaa-cref.org> wrote:
> > >>
> > >> > Again, I would recommend using Nolan Lawson's
> > >> > SynonymExpandingExtendedDismaxQParserPlugin.
> > >> >
> > >> > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> > >> >
> > >> > -----Original Message-----
> > >> > From: Aman Tandon [mailto:amantandon.10@gmail.com]
> > >> > Sent: Wednesday, May 27, 2015 6:42 PM
> > >> > To: solr-user@lucene.apache.org
> > >> > Subject: Re: docValues: Can we apply synonym
> > >> >
> > >> > Ok and what synonym processor you is talking about maybe it could
> > help ?
> > >> >
> > >> > With Regards
> > >> > Aman Tandon
> > >> >
> > >> > On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles <
> > >> > Charles.Reitzel@tiaa-cref.org> wrote:
> > >> >
> > >> > > Sorry, my bad.   The synonym processor I mention works differently.
> > >> It's
> > >> > > an extension of the EDisMax query processor and doesn't require
> > field
> > >> > > level synonym configs.
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: Reitzel, Charles [mailto:Charles.Reitzel@tiaa-cref.org]
> > >> > > Sent: Wednesday, May 27, 2015 6:12 PM
> > >> > > To: solr-user@lucene.apache.org
> > >> > > Subject: RE: docValues: Can we apply synonym
> > >> > >
> > >> > > But the query analysis isn't on a specific field, it is applied
to
> > the
> > >> > > query string.
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: Aman Tandon [mailto:amantandon.10@gmail.com]
> > >> > > Sent: Wednesday, May 27, 2015 6:08 PM
> > >> > > To: solr-user@lucene.apache.org
> > >> > > Subject: Re: docValues: Can we apply synonym
> > >> > >
> > >> > > Hi Charles,
> > >> > >
> > >> > > The problem here is that the docValues works only with primitives
> > data
> > >> > > type only like String, int, etc So how could we apply synonym
on
> > >> > > primitive data type.
> > >> > >
> > >> > > With Regards
> > >> > > Aman Tandon
> > >> > >
> > >> > > On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles <
> > >> > > Charles.Reitzel@tiaa-cref.org> wrote:
> > >> > >
> > >> > > > Is there any reason you cannot apply the synonyms at query
time?
> > >> > > >  Applying synonyms at indexing time has problems, e.g. polluting
> > the
> > >> > > > term frequency for synonyms added, preventing distance queries,
> > ...
> > >> > > >
> > >> > > > Since city names often have multiple terms, e.g. New York,
Den
> > >> > > > Hague, etc., I would recommend using Nolan Lawson's
> > >> > > > SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great,
less
> > >> > > filling.
> > >> > > >
> > >> > > >
> > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> > >> > > >
> > >> > > > We found this to fix synonyms like "ny" for "New York" and
vice
> > >> versa.
> > >> > > > Haven't tried it with docValues, tho.
> > >> > > >
> > >> > > > -----Original Message-----
> > >> > > > From: Aman Tandon [mailto:amantandon.10@gmail.com]
> > >> > > > Sent: Tuesday, May 26, 2015 11:15 PM
> > >> > > > To: solr-user@lucene.apache.org
> > >> > > > Subject: Re: docValues: Can we apply synonym
> > >> > > >
> > >> > > > Yes it could be :)
> > >> > > >
> > >> > > > Anyway thanks for helping.
> > >> > > >
> > >> > > > With Regards
> > >> > > > Aman Tandon
> > >> > > >
> > >> > > > On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti <
> > >> > > > benedetti.alex85@gmail.com> wrote:
> > >> > > >
> > >> > > > > I should investigate that, as usually synonyms are
analysis
> > stage.
> > >> > > > > A simple way is to replace the word with all its synonyms
(
> > >> > > > > including original word), but simply using this kind
of
> > processor
> > >> > > > > will change the token position and offsets, modifying
the actual
> > >> > > > > content of the
> > >> > > > document .
> > >> > > > >
> > >> > > > > " I am from Bombay" will become " I am from Bombay
Mumbai" which
> > >> > > > > can be annoying.
> > >> > > > > So a clever approach must be investigated.
> > >> > > > >
> > >> > > > > 2015-05-26 17:36 GMT+01:00 Aman Tandon <amantandon.10@gmail.com
> > >:
> > >> > > > >
> > >> > > > > > Okay So how could I do it with UpdateProcessors?
> > >> > > > > >
> > >> > > > > > With Regards
> > >> > > > > > Aman Tandon
> > >> > > > > >
> > >> > > > > > On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti
<
> > >> > > > > > benedetti.alex85@gmail.com> wrote:
> > >> > > > > >
> > >> > > > > > > mmm this is different !
> > >> > > > > > > Without any customisation, right now you
could :
> > >> > > > > > > - use docValues to provide exact value facets.
> > >> > > > > > > - Than you can use a copy field, with the
proper analysis,
> > to
> > >> > > > > > > search
> > >> > > > > > when a
> > >> > > > > > > user click on a filter !
> > >> > > > > > >
> > >> > > > > > > So you will see in your facets :
> > >> > > > > > > Mumbai(3)
> > >> > > > > > > Bombay(2)
> > >> > > > > > >
> > >> > > > > > > And when clicking you see 5 results.
> > >> > > > > > > A little bit misleading for the users …
> > >> > > > > > >
> > >> > > > > > > On the other hand if you you want to apply
the synonyms
> > >> > > > > > > before, the indexing pipeline ( because docValues
field can
> > >> > > > > > > not be analysed), I
> > >> > > > > think
> > >> > > > > > > you should play with UpdateProcessors.
> > >> > > > > > >
> > >> > > > > > > Cheers
> > >> > > > > > >
> > >> > > > > > > 2015-05-26 17:18 GMT+01:00 Aman Tandon <
> > >> amantandon.10@gmail.com
> > >> > >:
> > >> > > > > > >
> > >> > > > > > > > We are interested in using docValues
for better memory
> > >> > > > > > > > utilization
> > >> > > > > and
> > >> > > > > > > > speed.
> > >> > > > > > > >
> > >> > > > > > > > Currently we are faceting the search
results on *city. *In
> > >> > > > > > > > city we
> > >> > > > > have
> > >> > > > > > > > also added the synonym for cities like
mumbai, bombay
> > (These
> > >> > > > > > > > are
> > >> > > > > Indian
> > >> > > > > > > > cities). So that result of mumbai is
also eligible when
> > >> > > > > > > > somebody will applying filter of bombay
on search results.
> > >> > > > > > > >
> > >> > > > > > > > I need this functionality to apply with
docValues enabled
> > >> > field.
> > >> > > > > > > >
> > >> > > > > > > > With Regards
> > >> > > > > > > > Aman Tandon
> > >> > > > > > > >
> > >> > > > > > > > On Tue, May 26, 2015 at 9:19 PM, Alessandro
Benedetti <
> > >> > > > > > > > benedetti.alex85@gmail.com> wrote:
> > >> > > > > > > >
> > >> > > > > > > > > I checked in the Documentation
to be sure, but
> > apparently :
> > >> > > > > > > > >
> > >> > > > > > > > > DocValues are only available for
specific field types.
> > The
> > >> > > > > > > > > types
> > >> > > > > > chosen
> > >> > > > > > > > > determine the underlying Lucene
docValue type that will
> > be
> > >> > > used.
> > >> > > > > The
> > >> > > > > > > > > available Solr field types are:
> > >> > > > > > > > >
> > >> > > > > > > > >    - StrField and UUIDField.
> > >> > > > > > > > >    - If the field is single-valued
(i.e., multi-valued
> > is
> > >> > > > > > > > > false),
> > >> > > > > > > Lucene
> > >> > > > > > > > >       will use the SORTED type.
> > >> > > > > > > > >       - If the field is multi-valued,
Lucene will use
> > the
> > >> > > > > SORTED_SET
> > >> > > > > > > > type.
> > >> > > > > > > > >    - Any Trie* numeric fields and
EnumField.
> > >> > > > > > > > >    - If the field is single-valued
(i.e., multi-valued
> > is
> > >> > > > > > > > > false),
> > >> > > > > > > Lucene
> > >> > > > > > > > >       will use the NUMERIC type.
> > >> > > > > > > > >       - If the field is multi-valued,
Lucene will use
> > the
> > >> > > > > SORTED_SET
> > >> > > > > > > > type.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > This means you should not analyse
a field where
> > DocValues
> > >> > > > > > > > > is
> > >> > > > > enabled.
> > >> > > > > > > > > Can your explain us your use case
? Why are you
> > interested
> > >> > > > > > > > > in
> > >> > > > > > synonyms
> > >> > > > > > > > > DocValues level ?
> > >> > > > > > > > >
> > >> > > > > > > > > Cheers
> > >> > > > > > > > >
> > >> > > > > > > > > 2015-05-26 13:32 GMT+01:00 Upayavira
<uv@odoko.co.uk>:
> > >> > > > > > > > >
> > >> > > > > > > > > > To my understanding, docValues
are just an uninverted
> > >> > index.
> > >> > > > > > > > > > That
> > >> > > > > > is,
> > >> > > > > > > > it
> > >> > > > > > > > > > contains the terms that are
generated at the end of an
> > >> > > > > > > > > > analysis
> > >> > > > > > > chain.
> > >> > > > > > > > > > Therefore, you simply enable
docValues and include the
> > >> > > > > > > > > > SynonymFilterFactory in your
analysis.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Is that enough, or are you
struggling with some other
> > >> > issue?
> > >> > > > > > > > > >
> > >> > > > > > > > > > Upayavira
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Tue, May 26, 2015, at 12:03
PM, Aman Tandon wrote:
> > >> > > > > > > > > > > Hi,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > We have some field *city*
in which the docValues are
> > >> > > enabled.
> > >> > > > > We
> > >> > > > > > > need
> > >> > > > > > > > > to
> > >> > > > > > > > > > > add the synonym in that
field so how could we do it?
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > With Regards
> > >> > > > > > > > > > > Aman Tandon
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > > --------------------------
> > >> > > > > > > > >
> > >> > > > > > > > > Benedetti Alessandro
> > >> > > > > > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > > > > > >
> > >> > > > > > > > > "Tyger, tyger burning bright In
the forests of the
> > night,
> > >> > > > > > > > > What immortal hand or eye Could
frame thy fearful
> > >> symmetry?"
> > >> > > > > > > > >
> > >> > > > > > > > > William Blake - Songs of Experience
-1794 England
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > --------------------------
> > >> > > > > > >
> > >> > > > > > > Benedetti Alessandro
> > >> > > > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > > > >
> > >> > > > > > > "Tyger, tyger burning bright
> > >> > > > > > > In the forests of the night,
> > >> > > > > > > What immortal hand or eye
> > >> > > > > > > Could frame thy fearful symmetry?"
> > >> > > > > > >
> > >> > > > > > > William Blake - Songs of Experience -1794
England
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > --------------------------
> > >> > > > >
> > >> > > > > Benedetti Alessandro
> > >> > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > >
> > >> > > > > "Tyger, tyger burning bright
> > >> > > > > In the forests of the night,
> > >> > > > > What immortal hand or eye
> > >> > > > > Could frame thy fearful symmetry?"
> > >> > > > >
> > >> > > > > William Blake - Songs of Experience -1794 England
> > >> > > > >
> > >> > > >
> > >> > > >
> > ********************************************************************
> > >> > > > **
> > >> > > > *** This e-mail may contain confidential or privileged
> > information.
> > >> > > > If you are not the intended recipient, please notify the
sender
> > >> > > > immediately and then delete it.
> > >> > > >
> > >> > > > TIAA-CREF
> > >> > > >
> > ********************************************************************
> > >> > > > **
> > >> > > > ***
> > >> > > >
> > >> > >
> > >> > >
> > **********************************************************************
> > >> > > *** This e-mail may contain confidential or privileged information.
> > >> > > If you are not the intended recipient, please notify the sender
> > >> > > immediately and then delete it.
> > >> > >
> > >> > > TIAA-CREF
> > >> > >
> > **********************************************************************
> > >> > > ***
> > >> > >
> > >> > >
> > **********************************************************************
> > >> > > *** This e-mail may contain confidential or privileged information.
> > >> > > If you are not the intended recipient, please notify the sender
> > >> > > immediately and then delete it.
> > >> > >
> > >> > > TIAA-CREF
> > >> > >
> > **********************************************************************
> > >> > > ***
> > >> > >
> > >> >
> > >> >
> > *************************************************************************
> > >> > This e-mail may contain confidential or privileged information.
> > >> > If you are not the intended recipient, please notify the sender
> > >> > immediately and then delete it.
> > >> >
> > >> > TIAA-CREF
> > >> >
> > *************************************************************************
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >

Mime
View raw message