pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PDFBOX-4370) Jempbox's ResourceEvent crazily slow to initialize
Date Wed, 07 Nov 2018 18:38:00 GMT
Tim Allison created PDFBOX-4370:
-----------------------------------

             Summary: Jempbox's ResourceEvent crazily slow to initialize
                 Key: PDFBOX-4370
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4370
             Project: PDFBox
          Issue Type: Task
          Components: JempBox
    Affects Versions: 1.8.16
            Reporter: Tim Allison
         Attachments: slow.zip

In our new batch of regression files on Tika, one of the new PDFs caused a timeout.  This
is not an infinite loop, but it does take several minutes. This may not be fixable.

Admittedly, the XMP is large, and there are quite a few events.

This is the code that triggers the problem.
{noformat}
            XMPMetadata xmp = XMPMetadata.load(is);
            XMPSchemaMediaManagement mmSchema = xmp.getMediaManagementSchema();
            mmSchema.getHistory();
{noformat}

The slow part _seems_ to be setting the attribute namespace when creating a new ResourceEvent.
 When I comment out the following in ResourceEvent's initializer, the processing time is quite
fast (1 second).

{noformat}
            parent.setAttributeNS( 
                XMPSchema.NS_NAMESPACE, 
                "xmlns:stEvt", 
                NAMESPACE );
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message