lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pete Smith <pete.sm...@lovefilm.com>
Subject using NGramTokenizerFactory for partial matching
Date Tue, 07 Apr 2009 14:36:01 GMT
Hi,

I want to use the NGramTokenizerFactory tokeniser to enable partial
matching on a field in my index. For instance for the field:

"Lorem ipsum"

I want it to match "lor" "lorem" and "lorem i". However I am finding it
matches the first two but not the third - the white space is causing
problems. Here are the relevant parts of my config: 

        <fieldType name="text_substring" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.NGramTokenizerFactory"
minGramSize="3" maxGramSize="15" />  
                <filter class="solr.LowerCaseFilterFactory"/>  
  </analyzer>
</fieldType>

<field name="title_partial" type="text_substring" indexed="true"
stored="true" required="true" />

I believe it is due to the mingramsize setting and that is applying to
each word. Can anyone tell me how I can support what I want to do?

Cheers,
Pete

-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com

Mime
View raw message