lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Ryan <mr...@moreover.com>
Subject RE: Case insensitive StringField?
Date Tue, 21 May 2013 14:16:45 GMT
Here's what we use for this:

    <fieldType name="caseInsensitiveString" class="solr.TextField" indexed="true" stored="true"
omitNorms="true" sortMissingLast="true" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <field name="someField" type="caseInsensitiveString" omitTermFreqAndPositions="true"/>

As far as I know, StringField does not use analyzers at all - they'll just be ignored.

KeywordTokenizerFactory does the "exact phrase" bit, and LowerCaseFilterFactory does the lowercasing.

-Michael

-----Original Message-----
From: Shahak Nagiel [mailto:snagiel@yahoo.com] 
Sent: Tuesday, May 21, 2013 10:06 AM
To: java-user@lucene.apache.org
Subject: Case insensitive StringField?

It appears that StringField instances are treated as literals, even though my analyzer lower-cases
(on both write and read sides).  So, for example, I can match with a term query (e.g. "NEW
YORK"), but only if the case matches.  If I use a QueryParser (or MultiFieldQueryParser),
it never works because these query values are lowercased and don't match.

I've found that using a TextField instead works, presumably because it's tokenized and processed
correctly by the write analyzer.  However, I would prefer that queries match against the
entire/exact phrase ("NEW YORK"), rather than among the tokens ("NEW" or "YORK").

What's the solution here?

Thanks in advance.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message