lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
Subject RE: Multi word synonyms
Date Tue, 15 Nov 2016 15:45:44 GMT
Midas,

I apparently I didn't read carefully enough, Ted Sullivan has in the configuration of this
AutoPhrasingTokenFilter a configuration file "autophrases.txt".   It only recognizes phrases
that are in that file.   Because of this, it doesn't seem directly applicable to your problem
of multi-word synonym matching at query time - because it won't know what terms to clump.
   Here's Ted Sullivan's earlier post on the Token filter - https://lucidworks.com/blog/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

I would therefore ask your users or their representative about the priority of this feature/requirement.

Going on, I think what you could do is to use an NLP toolkit such as OpenNLP, StanfordNLP
(both Java) or python NLTK to identify noun phrases in your text/corpus, and then use those
to build autophrases.txt.   You wouldn't need to use all of your corpus to get somewhat good
accuracy because new noun phrases will be rare at some point.   You may need to play with
which phrases to include, e.g. the size of autophrases.txt depending on how AutoPhrasingTokenFilter
is implemented and the rate of indexing you need to maintain. Depending on your experience,
you can do this even if you are new to Solr, as you've mentioned.

-----Original Message-----
From: Davis, Daniel (NIH/NLM) [C] 
Sent: Tuesday, November 15, 2016 10:22 AM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

I'm not as expert as some on this list, but reading the article suggested, https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/,
what you do is this:

- Have one field that takes text as normal
- Copy that field to another field, whose field type uses the AutoPhrasingTokenFilter
- Configure your result handler to query against both fields

You don't know the list of synonyms at query time, but now you have another field that contains
phrases, not words, and so you can indeed use synonym matching at query time against this
secondary field.   You can even use the multi-word phrases in the copied field to suggest
to admin users a list of candidate synonyms.

-----Original Message-----
From: Midas A [mailto:test.midas@gmail.com]
Sent: Tuesday, November 15, 2016 7:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore <v.damore@gmail.com>
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the 
> issues were fixed.
> The good news is that
> https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann <kuli@solr.info> wrote:
>
> > This is a nice reading though, but that solution depends on the 
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution 
> > anyway, it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> > > <kuli@solr.info>
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-starte
> > >> d
> > >>
> > >> It's working nicely, but can lead to OOME when you add many 
> > >> synonyms with multiple terms. And I'm not sure whether it#s still 
> > >> working with Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or 
> > >>> not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.damore@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>
Mime
View raw message