pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Closed] (PDFBOX-556) Performance regression from 0.7.3 to 0.8.0
Date Fri, 18 May 2012 15:37:07 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andreas Lehmkühler closed PDFBOX-556.

    Resolution: Won't Fix
      Assignee: Andreas Lehmkühler

Closed this issue as there were no more input in the last 2.5 years.
> Performance regression from 0.7.3 to 0.8.0
> ------------------------------------------
>                 Key: PDFBOX-556
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-556
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>            Reporter: Lars Torunski
>            Assignee: Andreas Lehmkühler
>         Attachments: screenshot-1.jpg
> After upgrading from version 0.7.3 to 0.8.0 our pdf indexing for lucene takes a lot longer
than expected.
> E.g. a single pdf needs 1150ms to be indexed compared to 750ms with version 0.7.3 ==>
> My first thought was that more pdfs are indexed or even indexed correctly with 0.8.0.
But that shouldn't be an impact more than 50%.
> Profiling with YourKit shows that a lot of time is spent in the method BaseParser.readUntilEndStream
and it's invocation of cmpCircularBuffer. Maybe somebody find out how to improve the performance
> The method readUntilEndStream handles endobj tags in the stream also which impacts of
course the performance, but this is OK.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message