lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content
Date Sun, 27 Jan 2013 17:53:11 GMT
Have you tried using the PDFParser [1] and the OfficeParser [2]
classes from Tika?

This question seems to be more appropriate for the Tika user mailing list [3]?

[1] http://tika.apache.org/1.3/api/org/apache/tika/parser/pdf/PDFParser.html#parse(java.io.InputStream,
org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata,
org.apache.tika.parser.ParseContext)
[2] http://tika.apache.org/1.3/api/org/apache/tika/parser/microsoft/OfficeParser.html#parse(java.io.InputStream,
org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata,
org.apache.tika.parser.ParseContext)
[3] http://tika.apache.org/mail-lists.html

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message