lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject RE: Case Sensitivity
Date Thu, 24 Jan 2002 18:43:53 GMT
> From: Brian Goetz []
> > 
> > This question is frequent enough that we should probably 
> > fix it.  Perhaps a
> > method should be added Analyzer:
> >   public boolean isLowercased(String fieldName);
> > When this is true, the query parser could lowercase prefix 
> > and range query
> > terms.  Fellow Lucene developers, what do you think of that?
> Something should be done, but I'm not sure this is the best way to do
> this.  Perhaps extend Analyzer to work in two modes;
> "tokenization-only" and "tokenization + term normalization".

I'm not sure that would fix the problem, since range and prefix query terms
might reasonably not even be something that the tokenizer would return.
Imagine you're indexing dates in the form "YYYY/MM/DD" and someone wants to
do a range search from "2000/10" through "2000/11", but your tokenizer will
barf if it sees an incomplete date.  Do you need to write your tokenizer to
handle ungrammatical input?  Is this scenario a stretch?  Perhaps.

I think the real reason I don't like your proposal is not so much that it
might not be fully general (although it might not be) but that it seems like
a lot more work than adding a simple predicate to analyzers, and I'm not
convinced there are any other uses for the API you suggest.  So why not just
go with something simple that directly addresses the need?


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message