Return-Path: X-Original-To: apmail-pdfbox-dev-archive@www.apache.org Delivered-To: apmail-pdfbox-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A10069897 for ; Fri, 18 May 2012 15:37:30 +0000 (UTC) Received: (qmail 15530 invoked by uid 500); 18 May 2012 15:37:30 -0000 Delivered-To: apmail-pdfbox-dev-archive@pdfbox.apache.org Received: (qmail 15513 invoked by uid 500); 18 May 2012 15:37:30 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 15504 invoked by uid 99); 18 May 2012 15:37:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 15:37:30 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 15:37:28 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 387B5CAC7 for ; Fri, 18 May 2012 15:37:07 +0000 (UTC) Date: Fri, 18 May 2012 15:37:07 +0000 (UTC) From: =?utf-8?Q?Andreas_Lehmk=C3=BChler_=28JIRA=29?= To: dev@pdfbox.apache.org Message-ID: <308934228.14385.1337355427232.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Closed] (PDFBOX-556) Performance regression from 0.7.3 to 0.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PDFBOX-556?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmk=C3=BChler closed PDFBOX-556. ------------------------------------- Resolution: Won't Fix Assignee: Andreas Lehmk=C3=BChler Closed this issue as there were no more input in the last 2.5 years. =20 > Performance regression from 0.7.3 to 0.8.0 > ------------------------------------------ > > Key: PDFBOX-556 > URL: https://issues.apache.org/jira/browse/PDFBOX-556 > Project: PDFBox > Issue Type: Improvement > Components: Parsing > Affects Versions: 0.8.0-incubator > Reporter: Lars Torunski > Assignee: Andreas Lehmk=C3=BChler > Attachments: screenshot-1.jpg > > > After upgrading from version 0.7.3 to 0.8.0 our pdf indexing for lucene t= akes a lot longer than expected. > E.g. a single pdf needs 1150ms to be indexed compared to 750ms with versi= on 0.7.3 =3D=3D> +50% > My first thought was that more pdfs are indexed or even indexed correctly= with 0.8.0. But that shouldn't be an impact more than 50%. > Profiling with YourKit shows that a lot of time is spent in the method Ba= seParser.readUntilEndStream and it's invocation of cmpCircularBuffer. Maybe= somebody find out how to improve the performance here. > The method readUntilEndStream handles endobj tags in the stream also whic= h impacts of course the performance, but this is OK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira