lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Hypenated word
Date Mon, 13 Jun 2005 14:52:19 GMT

On Jun 13, 2005, at 10:55 AM, Andy Roberts wrote:

> On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote:
>
>> I see, the list of exceptions makes this a lot more complicated  
>> than I
>> thought... Thanks a lot, Erik!
>>
>>
>
> I expect you'll need to do some pre-processing. Read in your text  
> into a
> buffer, line-by-line. If a given line ends with a hyphen, you can  
> manipulate
> the buffer to merge the hyphenated tokens.

The problem I encountered when indexing "Lucene in Action" was that I  
couldn't just blindly concatenate two tokens because the first ends  
with a hyphen.  Some lines ended with a hyphen because it was a dash,  
not a hyphenated word.

I'm sure other more clever implementations could do this better, by  
looking up the concatenated word in a dictionary for instance.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message