lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharath jagannath <sharathjagann...@gmail.com>
Subject Duplicates in the suggester.
Date Wed, 05 Sep 2012 22:47:44 GMT
Not sure whether it is a duplicate question. Did try to browse through the
archive and did not find anything specific to what I was looking for.
I see duplicates in the dictionary if I update the document concurrently.

I am using Solr 3.6.1 with the following configurations for suggester:

Solr Config:
   <searchComponent name="suggest" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text_auto_suggest</str>
        <lst name="spellchecker">
            <str name="name">suggest</str>
            <str
name="classname">org.apache.solr.spelling.suggest.Suggester</str>
            <str
name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
            <str name="field">name_auto</str>
            <str name="buildOnCommit">true</str>
        </lst>
    </searchComponent>
    <requestHandler name="/suggest"
        class="org.apache.solr.handler.component.SearchHandler">
        <lst name="defaults">
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggest</str>
            <str name="spellcheck.count">10</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>

Schema:
        <fieldType name="text_auto_suggest" class="solr.TextField"
            omitNorms="true">
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <!-- <tokenizer class="solr.KeywordTokenizerFactory" /> -->
                <!-- <filter class="solr.LowerCaseFilterFactory" />  -->
                <filter class="solr.ClassicFilterFactory" />
                <!-- <filter class="solr.LengthFilterFactory" min="2" /> -->
            </analyzer>

            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TrimFilterFactory" />
                <filter class="solr.ClassicFilterFactory" />
                <!-- <filter class="solr.LengthFilterFactory" min="2" /> -->
            </analyzer>
        </fieldType>


        <field name="name_auto" type="text_auto_suggest" indexed="true"
            stored="true" multiValued="false" />

Example text I would be indexing for suggester:
foo_bar %|4%|1%|food

%| - used as a combiner,
Part 1: foo_bar, Name of the entity
Part 2: number of activities(application specific) on the entity.
Part 3: id of the document.
Part 4: food, category of the entity.

As I mentioned earlier, I saw duplicates in the spellcheck index documents
when I updated the concurrently.

<arr name="suggestion">
<str>foo_bar %|4%|1%|food</str>
<str>foo_bar %|1%|1%|food</str>
<str>foo_bar %|2%|1%|food</str>
<str>foo_bar %|3%|1%|food</str>
</arr>

I do not see duplicates when I update the documents sequentially. I have a
strong doubt this is happening because of the way I am combining multiple
fields using %|.
Would appreciate if somebody could suggest any suitable changes that would
help me with this issue.


-- 
Thanks,
Sharath

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message