lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eleanor Joslin <...@decisionsoft.com>
Subject Re: Using a QueryParser with an untokenized field?
Date Fri, 01 Feb 2008 12:46:29 GMT
Thank you, this was exactly what I needed.  So "tokenizing" really 
denotes a more general process that can involve normalizing the case or 
whatever else can be done with a filter.  This is where I was confused.

Eleanor

Jan Peter Stotz wrote:
> Hi Eleanor.
> 
>> In my Lucene index there's a field that contains the local names of 
>> XML elements, one name per document.  Users can enter arbitrary 
>> queries for this field, so I'm using a QueryParser.
> 
>> From reading around it looks as if the field needs to be tokenized, 
>> but since the field's content is always a single term, is this really 
>> necessary?  
> 
> You are right, your field is already tokenized, but from what I know the 
> main difference is that untokenized fields do not pass your selected 
> analyzer when being added to the index. If your analyzer for example 
> incorporates the LowerCaseFilter,  the field will be converted into 
> lower case before it is indexed. When using the same analyzer for your 
> QueryParser this will allow you to perform case insensitive query.
> 
> If you add the field untokenized and your Analyzer (at query time) 
> incorporates the LowerCaseFilter, you will be unable find elements that 
> contain upper characters.
> 
> Jan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
Eleanor Joslin, Software Development   DecisionSoft Ltd.
Telephone: +44-1865-203192             http://www.decisionsoft.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message