lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Blythe <j...@curvolabs.com>
Subject Re: Multi word synonyms
Date Sun, 26 Mar 2017 23:04:43 GMT
Sure thing. Post back w what you find!

Good luck-

On Sun, Mar 26, 2017 at 3:36 PM Sanjana Sridhar <sanjana.sridhar@flipp.com>
wrote:

> Hi John,
>
> Thanks for letting me know what works for you. I'm going to try that out.
> Sounds like a suitable solution to my problem.
>
> Best,
> Sanjana
>
>
>
> On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <john@curvolabs.com> wrote:
>
> > I use the keyword tokenizer and then pattern replace to transform multi
> > words into underscore connected tokens. For instance, "Burger Joint"
> > transforms to "burger_joint" which then looks in my synonym filter for
> > underscored synonyms. When it matches I then replace underscores with
> > spaces or just toss over to the word delimiter filter factory before
> > further processing
> >
> >
> > On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
> > sanjana.sridhar@wishabi.com> wrote:
> >
> > > Hello,
> > >
> > > Does anyone have a good solution for working with multi word synonyms?
> > I've
> > > been reading a lot about this online and haven't really found a great
> > > solution to it. I use the SynonymFilterFactory at index time, but words
> > > don't really get matched to the appropriate multi word synonyms, even
> > > though using the Analysis tool shows that it should be matched.
> > >
> > > Examples:
> > >
> > > coke, coca cola
> > >
> > >
> > >
> > > This is the configuration I have on text fields:
> > >
> > > <fieldType name ="text_icu_english" class="solr.TextField"
> > > positionIncrementGap="100" multiValued="true">
> > >         <analyzer type="index">
> > >         <!-- The white space tokenizer splits on white space but
> > preserves
> > > the tokens so that it can be used by the next filter -->
> > >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >         <filter class="solr.SynonymFilterFactory" ignoreCase="true"
> > expand=
> > > "true" synonyms="synonyms.txt" />
> > >         <!-- This filter splits a word on punctuation, preserves the
> > > original, concatenates the split words and also stems english
> possessive
> > > nouns -->
> > >         <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="0" generateNumberParts = "0"
> > >           splitOnCaseChange = "0" preserveOriginal="1"
> > catenateWords="1"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter class="solr.EnglishMinimalStemFilterFactory"/>
> > >         <filter class="solr.ICUFoldingFilterFactory"/>
> > >         <filter class="solr.PatternReplaceFilterFactory"
> > > pattern="(.*[\*].*)"  replacement=""/>
> > >         <filter class="solr.TrimFilterFactory"/>
> > >         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
> > >         <filter class="solr.ClassicFilterFactory"/>
> > >
> > >       </analyzer>
> > >       <analyzer type="query">
> > >         <!-- The white space tokenizer splits on white space but
> > preserves
> > > the tokens so that it can be used by the next filter -->
> > >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >          <!-- This filter splits a word on punctuation, preserves the
> > > original, concatenates the split words and also stems english
> possessive
> > > nouns -->
> > >          <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="0" generateNumberParts = "0"
> > >           splitOnCaseChange = "0" preserveOriginal="1"
> > catenateWords="1"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter class="solr.EnglishMinimalStemFilterFactory"/>
> > >         <filter class="solr.ICUFoldingFilterFactory"/>
> > >         <filter class="solr.ClassicFilterFactory"/>
> > >       </analyzer>
> > >       <similarity class="solr.BM25SimilarityFactory">
> > >         <float name="b">0.0</float>
> > >       </similarity>
> > >     </fieldType>
> > >
> > >
> > > Greatly appreciate any help ya'll can offer.
> > >
> > > Thanks,
> > > Sanjana
> > >
> > > --
> > > IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> > > collectively referred to as "Communication"), is intended only for the
> > > addressee(s)
> > > named above.  This Communication may include information that is
> > > privileged, confidential and exempt from disclosure under applicable
> law.
> > >  If the recipient of this Communication is not the intended recipient,
> or
> > > the employee or agent responsible for delivering this Communication to
> > the
> > > intended recipient, you are notified that any dissemination,
> distribution
> > > or copying of this Communication is strictly prohibited.  If you have
> > > received this Communication in error, please notify the sender
> > immediately
> > > by phone or email and permanently delete this Communication from your
> > > computer without making a copy. Thank you.
> > >
> > --
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | john@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
>
>
>
> --
>
> <http://corp.flipp.com/> <http://corp.flipp.com/>
>
> Sanjana Sridhar
> Flipp Corporation
>
> p: 647-217-3599
> e: sanjana.sridhar@flipp.com
>
> --
> IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> collectively referred to as "Communication"), is intended only for the
> addressee(s)
> named above.  This Communication may include information that is
> privileged, confidential and exempt from disclosure under applicable law.
>  If the recipient of this Communication is not the intended recipient, or
> the employee or agent responsible for delivering this Communication to the
> intended recipient, you are notified that any dissemination, distribution
> or copying of this Communication is strictly prohibited.  If you have
> received this Communication in error, please notify the sender immediately
> by phone or email and permanently delete this Communication from your
> computer without making a copy. Thank you.
>
-- 
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | john@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message