lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suhas Indra <su...@robosoftin.com>
Subject PDF Text extraction
Date Fri, 27 Dec 2002 06:34:11 GMT
Hello List

I am using PDFBox to index some of the PDF documents. The parser works fine
and I can read the summary. But the contents are displayed as
java.io.InputStream.

When I try the following:
System.out.println(doc.getField("contents")) (where doc is the Document
object)

The result will be:

Text<contents:java.io.InputStreamReader@127dc0>

I want to print the extracted data.

Can anyone please let me know how to extract the contents?

Regards

Suhas 



--------------------------------------------------------------
Robosoft Technologies - Partners in Product Development
 
 
 
 
 




--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message