commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Grogan <mgro...@mobilegolfers.net>
Subject Re: [Fileupload] Reading MS-Word docs
Date Wed, 07 Jun 2006 14:59:06 GMT
Hi Mark,
I've found that since I posted the question, but it seems to be in a bit 
of a mess. HWPF or HDF? I'm still trying to piece together bits and 
pieces of documentation, but it's slow going.
Thanks,
Martin


Mark wrote:

> Apache POI is probably your best bet.
>
>
> On 6/7/06, Martin Grogan <mgrogan@keizensoftware.com> wrote:
>
>> Hi all,
>> Forgive the slightly off-topic question, but if someone here has done
>> this before, I'd appreciate a pointer.
>> Using Fileupload to allow a user to upload a MS-Word document and would
>> like to be able to strip out the text for indexing.
>> I have done this for PDF files using PDFBox and am looking for something
>> similar for Word documents. I have looked at Lucene, but it looks too
>> big and heavy for what we need.
>> Anyone have any ideas?
>> Thanks,
>> Martin
>>
>> -- 
>> ------------
>> Martin Grogan
>> Keizen Software
>>
>> mgrogan@keizensoftware.com
>> www.keizensoftware.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message