commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Grogan <>
Subject [Fileupload] Reading MS-Word docs
Date Wed, 07 Jun 2006 13:40:53 GMT
Hi all,
Forgive the slightly off-topic question, but if someone here has done 
this before, I'd appreciate a pointer.
Using Fileupload to allow a user to upload a MS-Word document and would 
like to be able to strip out the text for indexing.
I have done this for PDF files using PDFBox and am looking for something 
similar for Word documents. I have looked at Lucene, but it looks too 
big and heavy for what we need.
Anyone have any ideas?

Martin Grogan
Keizen Software

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message