lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Hypenated word
Date Mon, 13 Jun 2005 14:52:19 GMT

On Jun 13, 2005, at 10:55 AM, Andy Roberts wrote:

> On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote:
>> I see, the list of exceptions makes this a lot more complicated  
>> than I
>> thought... Thanks a lot, Erik!
> I expect you'll need to do some pre-processing. Read in your text  
> into a
> buffer, line-by-line. If a given line ends with a hyphen, you can  
> manipulate
> the buffer to merge the hyphenated tokens.

The problem I encountered when indexing "Lucene in Action" was that I  
couldn't just blindly concatenate two tokens because the first ends  
with a hyphen.  Some lines ended with a hyphen because it was a dash,  
not a hyphenated word.

I'm sure other more clever implementations could do this better, by  
looking up the concatenated word in a dictionary for instance.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message