lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Tokenize on another character
Date Mon, 31 Mar 2008 13:05:47 GMT
I'm confused on the use case you're trying to implement,
could you add a bit more explanation?

In particular, do you ever want ROCK to match
ROCK AND ROLL? If you want both, that is
some searches match partial keywords and some
match entire keywords, I recommend you create a
second field in your document KEYWORD_EXACT or
some such and index it UN_TOKENIZED (storage is
optional). Also, you can index the KEYWORD field
as TOKENIZED. Then, when you want to match exactly,
you search against the first field, when you want to search
on any piece, search the second.

If this is completely off base, could you post the use-cases
you're interested in?


On Mon, Mar 31, 2008 at 5:42 AM, <> wrote:

> Hello
> I just joined the list and need some help.
> I have a database of music tracks.These tracks have been added to an
> index. They are classified using keywords, so a track can have up to
> 20 keywords assigned to them. I took the keywords and create a
> "keyword" FIELD which was not stored and tokenized. The problem is
> this... if a user searches for a specific keyword such as "ROCK", it
> is finding as well as ROCK tracks, ROCK AND ROLL tracks. I realise
> this is due to the tokenization of the keyword FIELD. My question is
> this, how can i stop the analyser from tokenizing on the space
> character and instead tokenize on one i specifiy. That way, if i chose
> to tokenize on a comma, i could add a comma at the end of every
> keyword. Or have i gone about this the wrong way?
> Many thanks, any insight will be appreciated.
> Fiaz
> -----------------------------------------
> Email sent from
> Virus-checked <> using
> McAfee(R) Software and scanned for spam
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message