lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: NGram with words
Date Fri, 14 Dec 2012 16:16:18 GMT
Yeah, the positions for ngrams have a good chance of not being what you 
want.

But do try the Solr Admin Analysis web page for that index text and see what 
positions it generates for the sub-words. The two generated words used in 
your query may not have adjacent positions.

-- Jack Krupansky

-----Original Message----- 
From: Arkadi Colson
Sent: Friday, December 14, 2012 9:10 AM
To: solr-user@lucene.apache.org
Subject: NGram with words

Hi

When "abcdefg 123456" is in Solr I would like to have match with

- abcd
- cdef
- abcdefg 123456
- "abcdefg 123456"
- "defg 1234"

The last one is actually not working.
What am I doing wrong?
My config looks like this.

/<field name="smsc_description" type="text" indexed="true"
stored="false" multiValued="true" omitNorms="true" omitPositions="false"
omitTermFreqAndPositions="false"/>
    <field name="smsc_description_ngram" type="text_ngram"
indexed="true" stored="false" multiValued="true" omitNorms="true"
omitPositions="false" omitTermFreqAndPositions="false"/>

<copyField source="smsc_description" dest="smsc_description_ngram"/>

//<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
     </fieldType>

     <fieldType name="text_ngram" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.NGramFilterFactory" minGramSize="2"
maxGramSize="8"/>
</analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
     </fieldType>
/

BR,
Arkadi


Mime
View raw message