lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: Tokenizer
Date Mon, 30 Jul 2007 16:15:01 GMT
Hello,

> I have two questions.
> 
> First, Is there a tokenizer that takes every word and simply 
> makes a token
> out of it? 

org.apache.lucene.analysis.WhitespaceTokenizer

> So it looks for two white spaces and takes the characters
> between them and makes a token out of them?
> 
> If this tokenizer exists, is there a difference between doing that and
> simply storing the field in the document with Field.Index = 
> UN_TOKENIZED?

Yes certainly it is different. UN_TOKENIZED is as it says, taking a String for example and
put it "AS IS", as one single TERM in your index. For example you might want to do this when
you want to sort on caption of a document, or title. TOKENIZED in combination with the org.apache.lucene.analysis.WhitespaceTokenizer
tokenizes your string and indexes.

Regards Ard

> 
> --JP
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message