lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharath jagannath <>
Subject Duplicates in the suggester.
Date Wed, 05 Sep 2012 22:47:44 GMT
Not sure whether it is a duplicate question. Did try to browse through the
archive and did not find anything specific to what I was looking for.
I see duplicates in the dictionary if I update the document concurrently.

I am using Solr 3.6.1 with the following configurations for suggester:

Solr Config:
   <searchComponent name="suggest" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text_auto_suggest</str>
        <lst name="spellchecker">
            <str name="name">suggest</str>
            <str name="field">name_auto</str>
            <str name="buildOnCommit">true</str>
    <requestHandler name="/suggest"
        <lst name="defaults">
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggest</str>
            <str name="spellcheck.count">10</str>
        <arr name="components">

        <fieldType name="text_auto_suggest" class="solr.TextField"
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <!-- <tokenizer class="solr.KeywordTokenizerFactory" /> -->
                <!-- <filter class="solr.LowerCaseFilterFactory" />  -->
                <filter class="solr.ClassicFilterFactory" />
                <!-- <filter class="solr.LengthFilterFactory" min="2" /> -->

            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TrimFilterFactory" />
                <filter class="solr.ClassicFilterFactory" />
                <!-- <filter class="solr.LengthFilterFactory" min="2" /> -->

        <field name="name_auto" type="text_auto_suggest" indexed="true"
            stored="true" multiValued="false" />

Example text I would be indexing for suggester:
foo_bar %|4%|1%|food

%| - used as a combiner,
Part 1: foo_bar, Name of the entity
Part 2: number of activities(application specific) on the entity.
Part 3: id of the document.
Part 4: food, category of the entity.

As I mentioned earlier, I saw duplicates in the spellcheck index documents
when I updated the concurrently.

<arr name="suggestion">
<str>foo_bar %|4%|1%|food</str>
<str>foo_bar %|1%|1%|food</str>
<str>foo_bar %|2%|1%|food</str>
<str>foo_bar %|3%|1%|food</str>

I do not see duplicates when I update the documents sequentially. I have a
strong doubt this is happening because of the way I am combining multiple
fields using %|.
Would appreciate if somebody could suggest any suitable changes that would
help me with this issue.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message