lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: Indexing encrypted PDF documents using PDFBox-0.6.1
Date Sat, 17 May 2003 00:35:34 GMT

This seems to be more of a PDFBox issue than a lucene issue.  Please post
the stacktrace on the PDFBox mailing list.  Also 0.6.2 is available which
fixed some bugs.

http://www.sourceforge.net/projects/pdfbox
http://www.pdfbox.org

Ben


On Fri, 16 May 2003, Shoba Ramachandran wrote:

> Hi,
>
> Has anybody successfully indexed encrypted pdf
> documents?
>
> I get NullPointerException at
>
> decryptor.decryptDocument( "" );
>
> Thanks
> Shoba
>
> Code:
> --------
> public static Document pdfDocument(Document document,
> File file) throws Exception
>     {
>         PDDocument pdDocument = null;
>         try
>         {
>             PDFParser parser = new PDFParser(new
> FileInputStream(file));
>             parser.parse();
>
>             pdDocument = parser.getPDDocument();
>             System.out.println("pdDocument :  " +
> pdDocument);
>
>             if( pdDocument.isEncrypted() )
>             {
>                 DecryptDocument decryptor = new
> DecryptDocument( pdDocument );
>                 System.out.println("decryptor :  " +
> decryptor);
>                 //Just try using the default password
> and move on
>                 decryptor.decryptDocument( "" );
>             }
>
>             //create a tmp output stream with the size
> of the content.
>             ByteArrayOutputStream out = new
> ByteArrayOutputStream();
>             OutputStreamWriter writer = new
> OutputStreamWriter( out );
>             PDFTextStripper stripper = new
> PDFTextStripper();
>             stripper.writeText(
> pdDocument.getDocument(), writer );
>             writer.close();
>
>             byte[] contents = out.toByteArray();
>             out.close();
>             InputStreamReader input = new
> InputStreamReader( new ByteArrayInputStream( contents
> ) );
>
>             // Add the tag-stripped contents as a
> Reader-valued Text field so it will
>             // get tokenized and indexed.
>             document.add(Field.Text("Contents", input
> ));
>
>             int summarySize = Math.min(
> contents.length, 200 );
>             // Add the summary as an UnIndexed field,
> so that it is stored and returned
>             // with hit documents for display.
>             //System.out.println(" ************** PDF
> summary : " + new String( contents, 0, summarySize ));
>             document.add(Field.UnIndexed("Summary",
> new String( contents, 0, summarySize ) ) );
>
>             //add the properties
>             //addProperties(document, pdDocument);
>         }
>         catch( CryptographyException e )
>         {
>             throw new IOException("Error decrypting
> document(" + file.getPath() + "): " + e.getMessage());
>         }
>         catch( InvalidPasswordException e )
>         {
>             //they didn't suppply a password and the
> default of "" was wrong.
>             throw new IOException("The document(" +
> file.getPath() + ") is encrypted and will not be
> indexed.");
>         }
>         finally
>         {
>             if(pdDocument!=null) pdDocument.close();
>         }
>         return document;
>     }
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message