poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject RE: New POI problem
Date Fri, 15 Apr 2016 17:44:45 GMT
On Fri, 15 Apr 2016, Thaddaeus Fillmore - US wrote:
> Thanks for the reply!  I actually got it to work using ExtractorFactory 
> though.  (I had a typo in the path to the jar files).  Is Tika just for 
> Office documents or can it also read other formats?  Ideally I'd like 
> something that could process plain text, Word documents, pdfs, and 
> images, but as of right now I'm able to handle all of those formats 
> using a variety of means.

Apache Tika can probably get text out of your kitchen sink! Especially if 
it's panamanian... ;-)

Nick

Current formats = http://tika.apache.org/1.12/formats.html
Tika's use on panama papers = https://source.opennews.org/en-US/articles/people-and-tech-behind-panama-papers/

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message