lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <clemens...@mysign.ch>
Subject AW: AW: Keeping capitalization in suggestions?
Date Tue, 09 Dec 2014 13:58:36 GMT
Thanks for all the insightful links.
I tried http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr but that
approach returns searchresults instead of term-suggestions.

I have (at the moment) a solution based on http://wiki.apache.org/solr/TermsComponent . But
I might want multi-term-suggestions (and fuzzyness). 
Therefore I'd be very much interested how AnalyzingInfixLookupFactory (or any other suggest-component)
would allow to
a) return case-sensitive suggestions (i.e. as-indexed/stored)
b) allow case-insensitive suggestion-lookup
?
Anybody else doing what I'd like to do?

-----Ursprüngliche Nachricht-----
Von: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID] 
Gesendet: Montag, 8. Dezember 2014 19:25
An: solr-user@lucene.apache.org
Betreff: Re: AW: Keeping capitalization in suggestions?

Hi Clemens,

There a a number of ways to implement auto complete/suggest. Some of them pull data from indexed
terms, therefore they will be lowercased. Some pull data from stored values, therefore capitalisation
is preserved.

Here are great resources on this topic.

https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet


On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <clemensdev@mysign.ch> wrote:

Allthough making use of AnalyzingInfixSuggester I still getting "either or".

When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e. "chamäleon").
When lowercase-filter is not active I only get suggestions when querying "Chamä"

my solrconfig.xml
...
    <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
        <lst name="defaults">
            <str name="echoParams">none</str>
            <str name="wt">json</str>
            <str name="indent">false</str>
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggestDictionary</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">5</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>
...
    <searchComponent class="solr.SpellCheckComponent" name="suggest">
      <lst name="spellchecker">
        <str name="name">suggestDictionary</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
        <str name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory</str>
        <str name="field">suggest</str>  
        <str name="buildOnCommit">true</str>
        <str name="storeDir">suggester</str>
        <str name="suggestAnalyzerFieldType">text_suggest</str>
        <str name="minPrefixChars">4</str>
      </lst>
    </searchComponent>
...

my schema.xml
...
<field indexed="true" multiValued="true" name="suggest" stored="false" type="text_suggest"/>
...
    <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<!-- <filter class="solr.LowerCaseFilterFactory"/> -->        
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

<!--        <filter class="solr.LowerCaseFilterFactory"/>    -->    
  </analyzer>      
    </fieldType>
...


-----Ursprüngliche Nachricht-----
Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or
"Chamä", I'd expect to get "Chamäleon" (uppercased).
> But what happens is
>
> If lowecasefilter (see below (1)) set
> "chamä" returns "chamäleon"
> "Chamä" does not match
>
> If lowecasefilter (1) not set
> "Chamä" returns "Chamäleon"
> "chamä" does not match
>
> I guess lowecasefilter should not be set/active, but then how do I get matches even if
the search term is lowercased?
>
> Context:
> schema.xml
> ...
>      <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true"
synonyms="synonyms.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>          <filter class="solr.GermanLightStemFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>          <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>        </analyzer>
>      </fieldType>
>
> solrconfig.xml
> -----------------
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>          <lst name="spellchecker">
>              <str name="name">suggestDictionary</str>
>              <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>              <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>              <str name="field">suggest</str>
>              <float name="threshold">0.</float>
>              <str name="buildOnCommit">true</str>
>          </lst>
>      </searchComponent>
> ...
>
Mime
View raw message