lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dragan Jotanovic" <>
Subject Tokenizing text
Date Tue, 25 Nov 2003 09:50:57 GMT
Hi. I need to tokenize text while indexing but I don't want space to be delimiter. Delimiter
should be my custom character (for example comma). I understand that I would probably need
to implement my own analyzer, but could someone help me where to start. Is there any other
way to do this without writing custom analyzer?

This is what I want to achieve.
If I have some text that will be indexed like following:

man, people, time out, sun

and if I enter 'time' as a search word, I don't want to get "time out" in results. I need
exact keyword matching. I would achieve this if I tokenize "time out" as one token while idexing.

Maybe someone had similar problem? If someone knows how to handle this, please help me.

Dragan Jotanovic

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message