Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 96993 invoked from network); 27 Dec 2002 14:04:03 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 27 Dec 2002 14:04:03 -0000 Received: (qmail 11993 invoked by uid 97); 27 Dec 2002 14:05:14 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 11976 invoked by uid 97); 27 Dec 2002 14:05:13 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 11959 invoked by uid 98); 27 Dec 2002 14:05:13 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Date: Fri, 27 Dec 2002 09:03:47 -0500 (EST) From: Ben Litchfield To: Lucene Users List Subject: Re: PDF Text extraction In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N You need to do something like //first get the document field Field contentsField = doc.getField( "contents" ); //Then get the reader from the field BufferedReader contentsReader = new BufferedReader( contentsField.readerValue() ); //finally dump the contents of the reader to System.out String line = null; while( (line = contentsReader.readLine() ) != null ) { System.out.println( line ); } I have not tested if this compiles but it should be pretty close. Ben Litchfield On Fri, 27 Dec 2002, Suhas Indra wrote: > Hello List > > I am using PDFBox to index some of the PDF documents. The parser works fine > and I can read the summary. But the contents are displayed as > java.io.InputStream. > > When I try the following: > System.out.println(doc.getField("contents")) (where doc is the Document > object) > > The result will be: > > Text > > I want to print the extracted data. > > Can anyone please let me know how to extract the contents? > > Regards > > Suhas > > > > -------------------------------------------------------------- > Robosoft Technologies - Partners in Product Development > > > > > > > > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > -- -- To unsubscribe, e-mail: For additional commands, e-mail: