lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shoba Ramachandran <shoba_duru...@yahoo.com>
Subject Indexing encrypted PDF documents using PDFBox-0.6.1
Date Fri, 16 May 2003 15:31:57 GMT
Hi,

Has anybody successfully indexed encrypted pdf
documents?

I get NullPointerException at

decryptor.decryptDocument( "" );

Thanks
Shoba

Code:
--------
public static Document pdfDocument(Document document,
File file) throws Exception
    {
        PDDocument pdDocument = null;
        try
        {
            PDFParser parser = new PDFParser(new
FileInputStream(file));
            parser.parse();

            pdDocument = parser.getPDDocument();
            System.out.println("pdDocument :  " +
pdDocument);

            if( pdDocument.isEncrypted() )
            {
                DecryptDocument decryptor = new
DecryptDocument( pdDocument );
                System.out.println("decryptor :  " +
decryptor);
                //Just try using the default password
and move on
                decryptor.decryptDocument( "" );
            }

            //create a tmp output stream with the size
of the content.
            ByteArrayOutputStream out = new
ByteArrayOutputStream();
            OutputStreamWriter writer = new
OutputStreamWriter( out );
            PDFTextStripper stripper = new
PDFTextStripper();
            stripper.writeText(
pdDocument.getDocument(), writer );
            writer.close();

            byte[] contents = out.toByteArray();
            out.close();
            InputStreamReader input = new
InputStreamReader( new ByteArrayInputStream( contents
) );

            // Add the tag-stripped contents as a
Reader-valued Text field so it will
            // get tokenized and indexed.
            document.add(Field.Text("Contents", input
));

            int summarySize = Math.min(
contents.length, 200 );
            // Add the summary as an UnIndexed field,
so that it is stored and returned
            // with hit documents for display.
            //System.out.println(" ************** PDF
summary : " + new String( contents, 0, summarySize ));
            document.add(Field.UnIndexed("Summary",
new String( contents, 0, summarySize ) ) );

            //add the properties
            //addProperties(document, pdDocument);
        }
        catch( CryptographyException e )
        {
            throw new IOException("Error decrypting
document(" + file.getPath() + "): " + e.getMessage());
        }
        catch( InvalidPasswordException e )
        {
            //they didn't suppply a password and the
default of "" was wrong.
            throw new IOException("The document(" +
file.getPath() + ") is encrypted and will not be
indexed.");
        }
        finally
        {
            if(pdDocument!=null) pdDocument.close();
        }
        return document;
    }

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message