lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Yacyshyn <ryan.yacys...@gmail.com>
Subject Re: AW: AW: Keeping capitalization in suggestions?
Date Wed, 10 Dec 2014 05:26:48 GMT
Hi Clemens,

I recently added typeahead functionality to something I'm playing with and
I used the EdgeNGramFilterFactory to help. I just tried this out after
adding a doc with "Chamäleon" in my title.

I was able to get "Chamäleon", with a capital C, returned I searched for
chama, Chama, chamã, and Chamã.

Here's what I have in my files:

-----------------
solrconfig.xml:

<requestHandler name="/suggest_movie" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="wt">json</str>
    <str name="defType">edismax</str>
    <str name="rows">10</str>
    <str name="omitHeader">true</str> <!-- keeping the response as lean as
possible so not returning header info.. -->
    <str name="fl">value:title</str> <!-- only returning 'title', and I
want that key to be called 'value' in the response.. -->
    <str name="qf">title^10 suggest_ngram</str> <!-- boosting title to show
on top if exact match with query.. -->
  </lst>
</requestHandler>

-----------------
schema.xml:

<fieldType name="text_suggest_ngram" class="solr.TextField"
positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.UAX29URLEmailTokenizerFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.ASCIIFoldingFilterFactory" />
   <filter class="solr.EnglishPossessiveFilterFactory" />
   <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="10" /> <!-- create edge n-grams of each term when indexing,
not when querying.. -->
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.UAX29URLEmailTokenizerFactory" />
   <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.ASCIIFoldingFilterFactory" />
   <filter class="solr.EnglishPossessiveFilterFactory" />
 </analyzer>
</fieldType>

...

<field name="suggest_ngram" type="text_suggest_ngram" indexed="true"
stored="false" />

...

<copyField source="title" dest="suggest_ngram" />

-----------------
request:

http://localhost:8983/solr/movies/suggest_movie?q=chama

-----------------
response:

{
    "response": {
        "numFound": 1,
        "start": 0,
        "docs": [
            {
                "value": "Chamäleon"
            }
        ]
    }
}

Hope this helps?

Ryan




On Tue Dec 09 2014 at 7:21:02 AM Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> Clemens --
>
>    what I do (see suggestions of titles of books on $EMPLOYER's web
> site) is to define a field with no analysis (type=keyword, use
> KeywordAnalyzer) and build the suggestions from that.  Then tell AIS to
> use an analyzer internally to pick out word from that (StandardAnalyzer,
> or WhitespaceAnalyzer, with LowerCaseFilter - however you want the
> matching to work in the suggester).  It will return the terms from the
> source field.
>
> You didn't show the definition of your "suggest" field - I expect it
> must be analyzed, right?  Just don't do that.
>
> -Mike
>
> On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote:
> > Thanks for all the insightful links.
> > I tried http://www.cominvent.com/2012/01/25/super-flexible-autocompl
> ete-with-solr but that approach returns searchresults instead of
> term-suggestions.
> >
> > I have (at the moment) a solution based on http://wiki.apache.org/solr/
> TermsComponent . But I might want multi-term-suggestions (and fuzzyness).
> > Therefore I'd be very much interested how AnalyzingInfixLookupFactory
> (or any other suggest-component) would allow to
> > a) return case-sensitive suggestions (i.e. as-indexed/stored)
> > b) allow case-insensitive suggestion-lookup
> > ?
> > Anybody else doing what I'd like to do?
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID]
> > Gesendet: Montag, 8. Dezember 2014 19:25
> > An: solr-user@lucene.apache.org
> > Betreff: Re: AW: Keeping capitalization in suggestions?
> >
> > Hi Clemens,
> >
> > There a a number of ways to implement auto complete/suggest. Some of
> them pull data from indexed terms, therefore they will be lowercased. Some
> pull data from stored values, therefore capitalisation is preserved.
> >
> > Here are great resources on this topic.
> >
> > https://lucidworks.com/blog/auto-suggest-from-popular-querie
> s-using-edgengrams/
> > http://blog.trifork.com/2012/02/15/different-ways-to-make-au
> to-suggestions-with-solr/
> > http://www.cominvent.com/2012/01/25/super-flexible-autocompl
> ete-with-solr/
> >
> > Ahmet
> >
> >
> > On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <
> clemensdev@mysign.ch> wrote:
> >
> > Allthough making use of AnalyzingInfixSuggester I still getting "either
> or".
> >
> > When lowercase-filter is active I always get suggestions, BUT they are
> lowercased (i.e. "chamäleon").
> > When lowercase-filter is not active I only get suggestions when querying
> "Chamä"
> >
> > my solrconfig.xml
> > ...
> >      <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest">
> >          <lst name="defaults">
> >              <str name="echoParams">none</str>
> >              <str name="wt">json</str>
> >              <str name="indent">false</str>
> >              <str name="spellcheck">true</str>
> >              <str name="spellcheck.dictionary">suggestDictionary</str>
> >              <str name="spellcheck.onlyMorePopular">true</str>
> >              <str name="spellcheck.count">5</str>
> >              <str name="spellcheck.collate">false</str>
> >          </lst>
> >          <arr name="components">
> >              <str>suggest</str>
> >          </arr>
> >      </requestHandler>
> > ...
> >      <searchComponent class="solr.SpellCheckComponent" name="suggest">
> >        <lst name="spellchecker">
> >          <str name="name">suggestDictionary</str>
> >          <str name="classname">org.apache.solr.spelling.suggest.
> Suggester</str>
> >          <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.
> AnalyzingInfixLookupFactory</str>
> >          <str name="dictionaryImpl">org.apache.solr.spelling.suggest.
> DocumentDictionaryFactory</str>
> >          <str name="field">suggest</str>
> >          <str name="buildOnCommit">true</str>
> >          <str name="storeDir">suggester</str>
> >          <str name="suggestAnalyzerFieldType">text_suggest</str>
> >          <str name="minPrefixChars">4</str>
> >        </lst>
> >      </searchComponent>
> > ...
> >
> > my schema.xml
> > ...
> > <field indexed="true" multiValued="true" name="suggest" stored="false"
> type="text_suggest"/> ...
> >      <fieldType class="solr.TextField" name="text_suggest"
> positionIncrementGap="100">
> >        <analyzer type="index">
> >          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> > <!-- <filter class="solr.LowerCaseFilterFactory"/> -->
> >        </analyzer>
> >        <analyzer type="query">
> >          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> > <!--        <filter class="solr.LowerCaseFilterFactory"/>    -->
> >    </analyzer>
> >      </fieldType>
> > ...
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
> > Gesendet: Donnerstag, 4. Dezember 2014 14:05
> > An: solr-user@lucene.apache.org
> > Betreff: Re: Keeping capitalization in suggestions?
> >
> > Have a look at AnalyzingInfixSuggester - it does what you want.
> >
> > -Mike
> >
> > On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> >> When I index a text such as "Chamäleon" and look for suggestions for
> "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased).
> >> But what happens is
> >>
> >> If lowecasefilter (see below (1)) set
> >> "chamä" returns "chamäleon"
> >> "Chamä" does not match
> >>
> >> If lowecasefilter (1) not set
> >> "Chamä" returns "Chamäleon"
> >> "chamä" does not match
> >>
> >> I guess lowecasefilter should not be set/active, but then how do I get
> matches even if the search term is lowercased?
> >>
> >> Context:
> >> schema.xml
> >> ...
> >>       <fieldType class="solr.TextField" name="text_de"
> positionIncrementGap="100">
> >>         <analyzer type="index">
> >>           <tokenizer class="solr.StandardTokenizerFactory"/>
> >>           <filter class="solr.LowerCaseFilterFactory"/>
> >>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt"/>
> >>           <filter class="solr.GermanLightStemFilterFactory"/>
> >>         </analyzer>
> >>         <analyzer type="query">
> >>           <tokenizer class="solr.StandardTokenizerFactory"/>
> >>           <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
> >>           <filter class="solr.LowerCaseFilterFactory"/>
> >>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt"/>
> >>           <filter class="solr.GermanLightStemFilterFactory"/>
> >>         </analyzer>
> >>       </fieldType>
> >> ...
> >>       <fieldType class="solr.TextField" name="text_suggest"
> positionIncrementGap="100">
> >>         <analyzer>
> >>           <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
> >>           <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> >>           <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
> >>         </analyzer>
> >>       </fieldType>
> >>
> >> solrconfig.xml
> >> -----------------
> >> ...
> >>       <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest">
> >>           <lst name="defaults">
> >>               <str name="echoParams">none</str>
> >>               <str name="wt">json</str>
> >>               <str name="indent">false</str>
> >>               <str name="spellcheck">true</str>
> >>               <str name="spellcheck.dictionary">suggestDictionary</str>
> >>               <str name="spellcheck.onlyMorePopular">true</str>
> >>               <str name="spellcheck.count">5</str>
> >>               <str name="spellcheck.collate">false</str>
> >>           </lst>
> >>           <arr name="components">
> >>               <str>suggest</str>
> >>           </arr>
> >>       </requestHandler>
> >> ...
> >>       <searchComponent class="solr.SpellCheckComponent" name="suggest">
> >>           <lst name="spellchecker">
> >>               <str name="name">suggestDictionary</str>
> >>               <str name="classname">org.apache.solr.spelling.suggest.
> Suggester</str>
> >>               <str name="lookupImpl">org.apache.s
> olr.spelling.suggest.fst.FSTLookupFactory</str>
> >>               <str name="field">suggest</str>
> >>               <float name="threshold">0.</float>
> >>               <str name="buildOnCommit">true</str>
> >>           </lst>
> >>       </searchComponent>
> >> ...
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message