Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 35174 invoked from network); 17 May 2003 00:35:51 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 17 May 2003 00:35:51 -0000 Received: (qmail 2209 invoked by uid 97); 17 May 2003 00:38:05 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 2202 invoked from network); 17 May 2003 00:38:04 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 17 May 2003 00:38:04 -0000 Received: (qmail 32951 invoked by uid 500); 17 May 2003 00:35:29 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 32879 invoked from network); 17 May 2003 00:35:28 -0000 Received: from blacksheep.csh.rit.edu (129.21.60.6) by daedalus.apache.org with SMTP; 17 May 2003 00:35:28 -0000 Received: from fury.csh.rit.edu (fury.csh.rit.edu [2001:470:1f00:135:a00:20ff:fe8d:5399]) by blacksheep.csh.rit.edu (Postfix) with ESMTP id BB55D3A3 for ; Fri, 16 May 2003 20:35:35 -0400 (EDT) Received: by fury.csh.rit.edu (Postfix, from userid 38448) id 4F80C1259; Fri, 16 May 2003 20:35:35 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by fury.csh.rit.edu (Postfix) with ESMTP id 1F44811B5 for ; Fri, 16 May 2003 20:35:35 -0400 (EDT) Date: Fri, 16 May 2003 20:35:34 -0400 (EDT) From: Ben Litchfield To: Lucene Users List Subject: Re: Indexing encrypted PDF documents using PDFBox-0.6.1 In-Reply-To: <20030516153157.87057.qmail@web40111.mail.yahoo.com> Message-ID: References: <20030516153157.87057.qmail@web40111.mail.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N This seems to be more of a PDFBox issue than a lucene issue. Please post the stacktrace on the PDFBox mailing list. Also 0.6.2 is available which fixed some bugs. http://www.sourceforge.net/projects/pdfbox http://www.pdfbox.org Ben On Fri, 16 May 2003, Shoba Ramachandran wrote: > Hi, > > Has anybody successfully indexed encrypted pdf > documents? > > I get NullPointerException at > > decryptor.decryptDocument( "" ); > > Thanks > Shoba > > Code: > -------- > public static Document pdfDocument(Document document, > File file) throws Exception > { > PDDocument pdDocument = null; > try > { > PDFParser parser = new PDFParser(new > FileInputStream(file)); > parser.parse(); > > pdDocument = parser.getPDDocument(); > System.out.println("pdDocument : " + > pdDocument); > > if( pdDocument.isEncrypted() ) > { > DecryptDocument decryptor = new > DecryptDocument( pdDocument ); > System.out.println("decryptor : " + > decryptor); > //Just try using the default password > and move on > decryptor.decryptDocument( "" ); > } > > //create a tmp output stream with the size > of the content. > ByteArrayOutputStream out = new > ByteArrayOutputStream(); > OutputStreamWriter writer = new > OutputStreamWriter( out ); > PDFTextStripper stripper = new > PDFTextStripper(); > stripper.writeText( > pdDocument.getDocument(), writer ); > writer.close(); > > byte[] contents = out.toByteArray(); > out.close(); > InputStreamReader input = new > InputStreamReader( new ByteArrayInputStream( contents > ) ); > > // Add the tag-stripped contents as a > Reader-valued Text field so it will > // get tokenized and indexed. > document.add(Field.Text("Contents", input > )); > > int summarySize = Math.min( > contents.length, 200 ); > // Add the summary as an UnIndexed field, > so that it is stored and returned > // with hit documents for display. > //System.out.println(" ************** PDF > summary : " + new String( contents, 0, summarySize )); > document.add(Field.UnIndexed("Summary", > new String( contents, 0, summarySize ) ) ); > > //add the properties > //addProperties(document, pdDocument); > } > catch( CryptographyException e ) > { > throw new IOException("Error decrypting > document(" + file.getPath() + "): " + e.getMessage()); > } > catch( InvalidPasswordException e ) > { > //they didn't suppply a password and the > default of "" was wrong. > throw new IOException("The document(" + > file.getPath() + ") is encrypted and will not be > indexed."); > } > finally > { > if(pdDocument!=null) pdDocument.close(); > } > return document; > } > > __________________________________ > Do you Yahoo!? > The New Yahoo! Search - Faster. Easier. Bingo. > http://search.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org