lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Young <>
Subject Better analysis of hyphenated words
Date Thu, 27 Oct 2005 16:13:56 GMT

I'm using StandardAnalyzer during indexing and I have noticed that it 
splits hyphenated words in two, ditching the hyphen. This is messing up 
some of my search results. I would like to keep using StandardAnalyzer 
because it's very good on the whole, however I would like to add an 
extra term in these cases. I am fine doing everything except figuring 
out when StandardTokenizer has split a hyphenated word. All I get is the 
individual tokens with a type ALPHANUM. Can anyone think of a way I can 
do this without having to dive into StandardTokenizer?

I have looked at the source for StandardTokenizer and I really really 
really don't want to have to go there :/


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message