lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <markus.wiederk...@gmail.com>
Subject Re: Hypenated word
Date Mon, 13 Jun 2005 14:52:23 GMT
On 6/13/05, Andy Roberts <mail@andy-roberts.net> wrote:
> On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote:
> > I see, the list of exceptions makes this a lot more complicated than I
> > thought... Thanks a lot, Erik!
> >
> 
> I expect you'll need to do some pre-processing. Read in your text into a
> buffer, line-by-line. If a given line ends with a hyphen, you can manipulate
> the buffer to merge the hyphenated tokens.

As Erik wrote it is not that simple, unfortunately. For example, if
one line ends with "read-" and the next line begins with "only" the
correct word is "read-only" not "readonly". Whereas "work-" and "ing"
should of course be merged into "working".

Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message