lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahak Nagiel <>
Subject Re: Case insensitive StringField?
Date Wed, 22 May 2013 02:09:29 GMT
Jack / Michael: Thanks, but the query parser still seems to be tokenizing the query?

public class StringPhraseAnalyzer extends Analyzer  {    
    protected TokenStreamComponents createComponents (String fieldName, Reader reader) {
        Tokenizer tok = new KeywordTokenizer(reader);
        TokenFilter filter = new LowerCaseFilter(Version.LUCENE_41, tok);
        filter = new TrimFilter(filter, true);
        return new TokenStreamComponents(tok, filter);


Analyzer analyzer = new StringPhraseAnalyzer();

// using this analyzer, add document to index with city TextField (value "NEW YORK")

QueryParser qp = new QueryParser(Version.LUCENE_41, "city", analyzer); 

Query q = qp.parse("new york");
System.out.println ("Query: " + q);

results in...
Query: city:new city:york// I expected "city:new york"

...and no matches.  Is a QueryParser the wrong way to generate the query for this type of

Thanks again!

 From: Jack Krupansky <>
Sent: Tuesday, May 21, 2013 10:22 AM
Subject: Re: Case insensitive StringField?

To be clear, analysis is not supported on StringField (or any non-tokenized 
field). But the good news is that by using the keyword tokenizer 
(KeywordTokenizer) on a TextField, you can get the same effect.

That will preserve the entire input as a single token. You may want to 
include filters to trim exterior white space and normalize interior white 

-- Jack Krupansky

-----Original Message----- 
From: Shahak Nagiel
Sent: Tuesday, May 21, 2013 10:06 AM
Subject: Case insensitive StringField?

It appears that StringField instances are treated as literals, even though 
my analyzer lower-cases (on both write and read sides).  So, for example, I 
can match with a term query (e.g. "NEW YORK"), but only if the case matches. 
If I use a QueryParser (or MultiFieldQueryParser), it never works because 
these query values are lowercased and don't match.

I've found that using a TextField instead works, presumably because it's 
tokenized and processed correctly by the write analyzer.  However, I would 
prefer that queries match against the entire/exact phrase ("NEW YORK"), 
rather than among the tokens ("NEW" or "YORK").

What's the solution here?

Thanks in advance. 

To unsubscribe, e-mail:
For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message