lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <>
Subject Hypenated word
Date Mon, 13 Jun 2005 11:08:04 GMT

I work on an application that has to index OCR texts of scanned books.
Naturally there occur many words that are hyphenated across lines.

I wonder if there is already an Analyzer or maybe a TokenFilter that
can merge those syllables back into whole words? It looks like Erik
Hatcher uses something like that at

Thanks in advance,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message