lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Confused about non-tokenized fields
Date Fri, 27 May 2005 17:47:01 GMT

On May 27, 2005, at 11:22 AM, Max Pfingsthorn wrote:

> Hi!
> In my application, I index some strings (like filenames)  
> untokenized, meaning via
> doc.add(new Field(FIELD,VALUE,false,true,false));
> When I later take a look at it with Luke, I still get tokens of the  
> filenames (like "news" instead of "news-item.xml") in the list of  
> most frequent terms. Shouldn't I get only the
> complete filenames there??

Perhaps that "news" term is coming from a different field?  Are you  
sure that you're seeing the filename field tokenized?  Your usage of  
the field constructor looks fine to me and should not tokenize.

> Also, how do I search case-insensitive over this kind of field?

Lucene is case-sensitive.  I suggest lowercasing the field before  
indexing, and search using lowercase.  This is the simplest  
suggestion, but you may need to use some other technique such as  
having different fields (or different indexes) to deal with case- 
sensitivity issues.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message