lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dragan Jotanovic" <Dragan.Jotano...@diosphere.com>
Subject Re: Tokenizing text custom way
Date Tue, 25 Nov 2003 12:06:19 GMT
Hi Rene,

> I've had the same problem. On some fields, I do
> employ a "NonTokenizer" now,
> which looks similar to the other tokenizers except for:

> protected boolean isTokenChar(char c)
>  {
>    return true;
>  }

> So "time out" would be one token.

This is OK solution in case that I have only "time out" in a field, but I
will have dozens of words in one field of a document. Like I said in
previous letter, I would have "man, people, time out, sun" and all those
words would be in one letter and all should be "searchable" (I need to
tokenize them like "man" "people" "time out" "sun").

Your solution isn't doing tokenizing, right?

Dragan Jotanovic





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message