lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Auto-suggest in Solr
Date Sat, 27 Jun 2015 22:51:19 GMT
Thanks, Erick, i didn't have time to go again through the code.
But i will forward this to the Dev list.
Thank you for your time !

Cheers

2015-06-27 16:19 GMT+01:00 Erick Erickson <erickerickson@gmail.com>:

> Alessandro:
>
> Going to have to defer to Mike McCandless et.al., they're the
> authorities here. Don't quite know whether they monitor this list,
> consider the dev list?
>
> Best,
> Erick
>
> On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
> <benedetti.alex85@gmail.com> wrote:
> > Up, Can anyone gently take a look to my considerations related the
> FreeText
> > Suggester ?
> > I am curious to have more insight.
> > Eventually I will deeply analyse the code to understand my errors.
> >
> > Cheers
> >
> > 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti <
> benedetti.alex85@gmail.com>
> > :
> >
> >> Actually the documentation is not clear enough.
> >> Let's try to understand this suggester.
> >>
> >> *Building*
> >> This suggester build a FST that it will use to provide the autocomplete
> >> feature running prefix searches on it .
> >> The terms it uses to generate the FST are the tokens produced by the
> >>  "suggestFreeTextAnalyzerFieldType" .
> >>
> >> And this should be correct.
> >> So if we have a shingle token filter[1-3] ( we produce unigrams as well)
> >> in our analysis to keep it simple , from these original field values :
> >> "mp3 ipod"
> >> "mp3 player"
> >> "mp3 player ipod"
> >> "player of Real"
> >>
> >> -> we produce these list of possible suggestions in our FST :
> >>
> >> <mp3>
> >> <player>
> >> <ipod>
> >> <real>
> >> <of>
> >>
> >> <mp3 ipod>
> >> <mp3 player>
> >> <player ipod>
> >> <player of>
> >> <of real>
> >>
> >> <mp3 player ipod>
> >> <player of real>
> >>
> >> From the documentation I read :
> >>
> >>> " ngrams: The max number of tokens out of which singles will be make
> the
> >>> dictionary. The default value is 2. Increasing this would mean you want
> >>> more than the previous 2 tokens to be taken into consideration when
> making
> >>> the suggestions. "
> >>
> >>
> >> This makes me confused, as I was not expecting this param to affect the
> >> suggestion dictionary.
> >> So I would like a clarification here from our masters :)
> >> At this point let's see what happens at query time .
> >>
> >> *Query Time *
> >> As my understanding the ngrams params will consider  the last N-1 tokens
> >> the user put separated by the space separator.
> >>
> >> "Builds an ngram model from the text sent to {@link
> >>> * #build} and predicts based on the last grams-1 tokens in
> >>> * the request sent to {@link #lookup}. This tries to
> >>> * handle the "long tail" of suggestions for when the
> >>> * incoming query is a never before seen query string."
> >>
> >>
> >> Example , grams=3 should consider only the last 2 tokens
> >>
> >> special mp3 p -> mp3 p
> >>
> >> Then this query is analysed using the
> "suggestFreeTextAnalyzerFieldType" .
> >> We produce 3 tokens :
> >> <mp3>
> >> <p>
> >> <mp3 p>
> >>
> >> And we run the prefix matching on the FST .
> >>
> >> *Conclusion*
> >> My understanding is wrong for sure at some point, as the behaviour I get
> >> is different.
> >> Can we discuss this , clarify this and eventually put it in the official
> >> documentation ?
> >>
> >> Cheers
> >>
> >> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
> >>
> >>> I'm implementing an auto-suggest feature in Solr, and I'll like to
> achieve
> >>> the follwing:
> >>>
> >>> For example, if the user enters "mp3", Solr might suggest "mp3 player",
> >>> "mp3 nano" and "mp3 music".
> >>> When the user enters "mp3 p", the suggestion should narrow down to "mp3
> >>> player".
> >>>
> >>> Currently, when I type "mp3 p", the suggester is returning words that
> >>> starts with the letter "p" only, and I'm getting results like "plan",
> >>> "production", etc, and it does not take the "mp3" token into
> >>> consideration.
> >>>
> >>> I'm using Solr 5.1 and below is my configuration:
> >>>
> >>> In solrconfig.xml:
> >>>
> >>> <searchComponent name="suggest" class="solr.SuggestComponent">
> >>>   <lst name="suggester">
> >>>
> >>>                  <str name="lookupImpl">FreeTextLookupFactory</str>
> >>>                  <str name="indexPath">suggester_freetext_dir</str>
> >>>
> >>> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> >>> <str name="field">Suggestion</str>
> >>> <str name="weightField">Project</str>
> >>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
> >>> <int name="ngrams">5</int>
> >>> <str name="buildOnStartup">false</str>
> >>> <str name="buildOnCommit">false</str>
> >>>   </lst>
> >>> </searchComponent>
> >>>
> >>>
> >>> In schema.xml
> >>>
> >>> <fieldType name="suggestType" class="solr.TextField"
> >>> positionIncrementGap="100">
> >>> <analyzer type="index">
> >>> <charFilter class="solr.PatternReplaceCharFilterFactory"
> >>> pattern="[^a-zA-Z0-9]" replacement=" " />
> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> >>> maxShingleSize="6" outputUnigrams="false"/>
> >>> </analyzer>
> >>> <analyzer type="query">
> >>> <charFilter class="solr.PatternReplaceCharFilterFactory"
> >>> pattern="[^a-zA-Z0-9]" replacement=" " />
> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> >>> maxShingleSize="6" outputUnigrams="true"/>
> >>> </analyzer>
> >>> </fieldType>
> >>>
> >>>
> >>> Is there anything that I configured wrongly?
> >>>
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>
> >>
> >>
> >> --
> >> --------------------------
> >>
> >> Benedetti Alessandro
> >> Visiting card : http://about.me/alessandro_benedetti
> >>
> >> "Tyger, tyger burning bright
> >> In the forests of the night,
> >> What immortal hand or eye
> >> Could frame thy fearful symmetry?"
> >>
> >> William Blake - Songs of Experience -1794 England
> >>
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message