Hello,
I work on an application that has to index OCR texts of scanned books.
Naturally there occur many words that are hyphenated across lines.
I wonder if there is already an Analyzer or maybe a TokenFilter that
can merge those syllables back into whole words? It looks like Erik
Hatcher uses something like that at http://www.lucenebook.com/.
Thanks in advance,
Markus
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|