lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Lynch <>
Subject Re: "WI" not Wi-Fi
Date Wed, 08 Sep 2010 22:29:53 GMT
Sorry to be confusing.  I'm actually using both.  I use Solr for its web
application features and Lucene for my background searches.  In this case,
the issue is with my Lucene side of things.

The analysis feature on the Solr admin page shows the analysis being correct
and wi-fi no longer matches "WI".  Here is the schema snippe for this type.
I changed generateWordParts="1" to "0" and that fixed the solr side of

        <fieldType name="text_standard" class="solr.TextField"
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"

However, I am using a StandardAnalyzer on the index beneath Solr and my hits
are still showing up with Wi-Fi.  I was curious if there was something
special I had to do with the StandardAnalyzer on the lucene side of things
in order to remove the word split functionality.  I know it's kind of an odd
relationship with Solr and Lucene, but I haven't had any other issues so

Please let me know if you think this belongs on the Solr list instead.


On Wed, Sep 8, 2010 at 5:23 PM, Erick Erickson <>wrote:

> I'm a bit confused, this is the Lucene list, but it sounds like you're
> using
> SOLR. If you are, could you post the relevant parts of your schema,
> especially the field type definition for the field in question? If you are,
> why not just take WordDelimiterFilterFactory out of your field type
> definition?
> The analysis page will help you lots here if you're in SOLR.
> StandardAnalyzer could well be splitting on '-' if you're using that.
> Best
> Erick
> On Wed, Sep 8, 2010 at 5:27 PM, Max Lynch <> wrote:
> > Hi,
> > I am using the StandardAnalyzer, but I am not interested in converting
> > words
> > like Wi-Fi into "Wi" and "Fi".  Rather, "WI" is an important word for my
> > users (indicating the state of Wisconsin) and I need "WI" to only match
> the
> > distinct word.
> >
> > I know in Solr I can set generateWordParts="0" for my
> > solr.WordDelimiterFilterFactory, but for some reason when I read the
> index
> > with Lucene the tokens are still separated.
> >
> > Is there a way to disable this?  Thanks.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message