lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: "WI" not Wi-Fi
Date Wed, 08 Sep 2010 22:23:18 GMT
I'm a bit confused, this is the Lucene list, but it sounds like you're using
SOLR. If you are, could you post the relevant parts of your schema,
especially the field type definition for the field in question? If you are,
why not just take WordDelimiterFilterFactory out of your field type
definition?

The analysis page will help you lots here if you're in SOLR.

StandardAnalyzer could well be splitting on '-' if you're using that.

Best
Erick

On Wed, Sep 8, 2010 at 5:27 PM, Max Lynch <ihasmax@gmail.com> wrote:

> Hi,
> I am using the StandardAnalyzer, but I am not interested in converting
> words
> like Wi-Fi into "Wi" and "Fi".  Rather, "WI" is an important word for my
> users (indicating the state of Wisconsin) and I need "WI" to only match the
> distinct word.
>
> I know in Solr I can set generateWordParts="0" for my
> solr.WordDelimiterFilterFactory, but for some reason when I read the index
> with Lucene the tokens are still separated.
>
> Is there a way to disable this?  Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message