lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: help with using ngram analyser needed
Date Fri, 22 Feb 2008 18:00:19 GMT
Hi,

Append &debugQuery=true to your request URLs to see what's going on.

Here is something I've used in the past.  I suggest you throw out everything but n-grams while
you're debugging.

    <!-- n-gram tokenization -->
    <fieldType name="unigram" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="org.apache.solr.analysis.NGramTokenizerFactory" minGramSize="1"
maxGramSize="1"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="org.apache.solr.analysis.NGramTokenizerFactory" minGramSize="1"
maxGramSize="1"/>
      </analyzer>
    </fieldType>

...
...
<field name="text_cn"     type="unigram"      indexed="true"     stored="true"    required="true"/>

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Christian Wittern <cwittern@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, February 22, 2008 4:32:08 AM
> Subject: help with using ngram analyser needed
> 
> Hi Solr users,
> 
> This is my first posting to this list, after experimenting with Solr
> for a few days.  Please bear with me.
> 
> I am trying to set up a text field for searching CJK text.  At the
> moment, I am trying using the ngram tokenizer factory, defined in the
> schema.xml as follows:
> 
>     
>       
>         
>         
>         
>       
>       
>         
>         
> synonyms="variants.txt" ignoreCase="true" expand="true"/>
>         
>       
>     
> 
> I can test this in the administrative interface and it seems to work.
> However, when I do searches, I only get matches for single character
> searches, or for searches that match a complete text field.  What I am
> trying to achieve is a substring match that would match any sequence
> of characters in the target field.
> 
> Any help appreciated,
> 
> Christian
> 
> 
> 
> -- 
> Christian Wittern, Kyoto
> 



Mime
View raw message