lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <p...@uptima.co.uk>
Subject Re: Parsers
Date Wed, 28 May 2003 13:01:35 GMT
Hi Adriano

Thanks.  Code samples would be nice :)

Will come back if I find something for .ppt.

Pete

----- Original Message -----
From: "Adriano Labate" <labate@verticali.com>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Wednesday, May 28, 2003 1:03 PM
Subject: RE : Parsers


The www.textmining.org text extractors work very well for Word and pdf
documents.
They use both PDFBox and POI.

For Excel, using POI directly is very easy. Tell me if you want to see
code samples.

I'm looking myself for a Powerpoint text extractor, if you know one...

Adriano Labate


-----Message d'origine-----
De : Pete Lewis [mailto:pete@uptima.co.uk]
Envoyé : mercredi, 28 mai 2003 12:48
À : Lucene Users List
Objet : Parsers


Hi all,

I have a rather nice html parser that I got from SourceForge.  Does
anyone know of any good parsers for pdf and Microsoft Office Suite
(.doc, .ppt, .xls, etc), any help would be much appreciated.

Pete Lewis




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message