chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian Müller <f...@apache.org>
Subject Re: Getting Fulltext from Document object
Date Tue, 14 Feb 2012 14:20:50 GMT

 Hi Sebastian,

 Check out Apache Tika [1].
 Provide the document stream to Tika and Tika should be able to give you 
 all kinds of information about the content, including the text.


 - Florian


 [1] http://tika.apache.org/1.0/parser.html


> Dear all,
>
> we are trying to mine documents, that we retrieve via CMIS.
>
> Whats the best way, to get the fulltext (as String) out of a Document
> object?
>
> best regards and thanks
>
> Sebastian


Mime
View raw message