lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KLessou <kles...@gmail.com>
Subject termFreq always = 1 ?
Date Wed, 01 Oct 2008 13:48:28 GMT
Hi,

I want to index a list of keywords.

When I search "k1_en:men", I find a lot of documents like that :

DocA :
(k1_en = man;men;Men;business... termFreq=2)
DocB :
(k1_en = man;Men;business... termFreq=1)
DocC :
...
DocD :
...
DocE :
...

But I don't want to have a different termFreq for DocA & DocB.

I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help me :-/

        <fieldtype name="keywords_en" class="solr.TextField">
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
                <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
                <!--filter class="solr.SnowballPorterFilterFactory"
language="English" /-->
                <!--filter class="solr.PhoneticFilterFactory"
encoder="DoubleMetaphone" inject="true"/-->

                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="0"
                    generateNumberParts="0"
                    catenateWords="0"
                    catenateNumbers="0"
                    catenateAll="0"
                    />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="index">
                <tokenizer class="solr.PatternTokenizerFactory" pattern=";"
/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
                <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
                <!--filter class="solr.SnowballPorterFilterFactory"
language="English" /-->
                <!--filter class="solr.PhoneticFilterFactory"
encoder="DoubleMetaphone" inject="true"/-->
                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="0"
                    generateNumberParts="0"
                    catenateWords="0"
                    catenateNumbers="0"
                    catenateAll="0"
                    />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldtype>


...


<field name="k1_en" type="keywords_en" indexed="true" stored="true"
required="false" />


If you have any idea, thx in advance.

-- 
~~~~~
| klessou |
~~~~~

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message