pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PDFBOX-4370) Jempbox's ResourceEvent crazily slow to initialize
Date Wed, 07 Nov 2018 18:38:00 GMT
Tim Allison created PDFBOX-4370:

             Summary: Jempbox's ResourceEvent crazily slow to initialize
                 Key: PDFBOX-4370
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4370
             Project: PDFBox
          Issue Type: Task
          Components: JempBox
    Affects Versions: 1.8.16
            Reporter: Tim Allison
         Attachments: slow.zip

In our new batch of regression files on Tika, one of the new PDFs caused a timeout.  This
is not an infinite loop, but it does take several minutes. This may not be fixable.

Admittedly, the XMP is large, and there are quite a few events.

This is the code that triggers the problem.
            XMPMetadata xmp = XMPMetadata.load(is);
            XMPSchemaMediaManagement mmSchema = xmp.getMediaManagementSchema();

The slow part _seems_ to be setting the attribute namespace when creating a new ResourceEvent.
 When I comment out the following in ResourceEvent's initializer, the processing time is quite
fast (1 second).

                NAMESPACE );

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

View raw message