Return-Path: X-Original-To: apmail-poi-user-archive@www.apache.org Delivered-To: apmail-poi-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFF5710762 for ; Fri, 13 Dec 2013 05:46:25 +0000 (UTC) Received: (qmail 59509 invoked by uid 500); 13 Dec 2013 05:46:25 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 59149 invoked by uid 500); 13 Dec 2013 05:46:19 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 59141 invoked by uid 99); 13 Dec 2013 05:46:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Dec 2013 05:46:17 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.13.204.73] (HELO urchin.earth.li) (212.13.204.73) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Dec 2013 05:46:12 +0000 Received: from nick (helo=localhost) by urchin.earth.li with local-esmtp (Exim 4.80) (envelope-from ) id 1VrLZM-0004UN-Lh; Fri, 13 Dec 2013 05:45:48 +0000 Date: Fri, 13 Dec 2013 05:45:48 +0000 (GMT) From: Nick Burch X-X-Sender: nick@urchin.earth.li To: POI Users List cc: Eric Hohnbaum Subject: Re: Feature request for adding custom properties to a document. In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org On Mon, 18 Nov 2013, Eric Hohnbaum wrote: > When using a .DOCX the problem is different. The issue becomes the > performance of writing each new properties once the size of the > collection reaches into the thousands. The first thousand or so > properties aren't that bad, but gets much worse as the collection gets > larger. In my time trials, monitoring timing each batch of 25 property > writes, the first batch of 25 entries took 0.028 seconds, the batch at > the 1k mark took 1.1 seconds, the batch at 2k took 4.3 seconds, and the > batch at 3K took 10.1 seconds. I have reason to believe the size of the > dataset may reach much larger than 3k. Are you able to profile the code a bit more, and see where the time is being taken? If it's in xmlbeans, we may have limited means to fix it, but it may be possible to change our calls. If it's in POI itself, then we can maybe do the lookups / checking in a different way to speed it up If you can, I'd suggest you open a new bug in bugzilla for this, then attach the results of your profiling (both from your email, and from a bit more to check exactly where in contains() the time goes), then we can take it from there Nick --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org