lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Prichard <michael_prich...@mac.com>
Subject extracting non-english text from word, pdf, etc....??
Date Wed, 01 Aug 2007 05:44:50 GMT
I know how to do english text with POI and PDFBox and so on.  Now, I want to start indexing
non-english language such as french and spanish.  Which extraction libs are available for
me?

I want to do:

Excel
Word
PowerPoint
PDF
HTML
RTF

Thanks!
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message