lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Young <bubble...@gmail.com>
Subject Better analysis of hyphenated words
Date Thu, 27 Oct 2005 16:13:56 GMT
Hi,

I'm using StandardAnalyzer during indexing and I have noticed that it 
splits hyphenated words in two, ditching the hyphen. This is messing up 
some of my search results. I would like to keep using StandardAnalyzer 
because it's very good on the whole, however I would like to add an 
extra term in these cases. I am fine doing everything except figuring 
out when StandardTokenizer has split a hyphenated word. All I get is the 
individual tokens with a type ALPHANUM. Can anyone think of a way I can 
do this without having to dive into StandardTokenizer?

I have looked at the source for StandardTokenizer and I really really 
really don't want to have to go there :/

Cheers
Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message