Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CD99E200D5D for ; Wed, 6 Dec 2017 01:19:18 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id CC248160C1C; Wed, 6 Dec 2017 00:19:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1EC85160C1B for ; Wed, 6 Dec 2017 01:19:17 +0100 (CET) Received: (qmail 72969 invoked by uid 500); 6 Dec 2017 00:19:17 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 72958 invoked by uid 99); 6 Dec 2017 00:19:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Dec 2017 00:19:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5B18B1A129C for ; Wed, 6 Dec 2017 00:19:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.002 X-Spam-Level: X-Spam-Status: No, score=-0.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id OkbdlApVmiNN for ; Wed, 6 Dec 2017 00:19:15 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 44EBD5F256 for ; Wed, 6 Dec 2017 00:19:15 +0000 (UTC) Received: from asf-bz1-us-mid.priv.apache.org (nat1-us-mid.apache.org [23.253.172.122]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTPS id CE33DE0044 for ; Wed, 6 Dec 2017 00:19:14 +0000 (UTC) Received: by asf-bz1-us-mid.priv.apache.org (ASF Mail Server at asf-bz1-us-mid.priv.apache.org, from userid 33) id 6093B61165; Wed, 6 Dec 2017 00:19:12 +0000 (UTC) From: bugzilla@apache.org To: dev@poi.apache.org Subject: [Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference Date: Wed, 06 Dec 2017 00:19:12 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: POI X-Bugzilla-Component: SS Common X-Bugzilla-Version: 3.15-FINAL X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: gwoolsey@apache.org X-Bugzilla-Status: NEEDINFO X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: dev@poi.apache.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bz.apache.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 archived-at: Wed, 06 Dec 2017 00:19:19 -0000 https://bz.apache.org/bugzilla/show_bug.cgi?id=3D61841 --- Comment #5 from Greg Woolsey --- Changes in r1817252 Interesting. In a local test with the attached sample file, I found these results: 45s (second run)=20 with current codebase issuing FormulaEvaluator.evaluateAll() on the workboo= k. 19s=20 By just changing XSSFEvaluationSheet.getCell(row, col) to immediately return null if the row index > sheet.getLastRowNum()=20 14.4s when XSSFEvaluationSheet caches the value of getLastRowNum(), since it comes from a TreeMap.lastKey() which has to navigate the tree each time to find t= he last key. 12.1s after optimizing the blank cell tracking a bit to know about the last row w= ith data. That's all without changing anything int he VLOOKUP evaluation and still iterating over the max # of rows per column. Of this remaining time, about 2/3 is taken up in the formula evaluation cac= hing and tracking mechanism. Bypassing it for null cells causes test failures, which shows it is necessary, but relatively expensive. It appears to try to optimize and minimize the "empty cell" rectangular regions it holds. but assumes processing by row then column. That may be a memory/time optimizat= ion we want to consider allowing additional strategies for. Note that this shortcut logic doesn't change the result of any methods, only avoids busywork that didn't apply to the "nonexistent cell" cases. This doesn't optimize VLOOKUP directly, but is about 70% improvement sufficient? Changing the VLOOKUP code itself is actually significantly more complex, because POI handles sheets by row internally, and columns are second-class constructs. There is no easy way to determine the last row with data in a column other than iterating over all defined rows. With these optimization= s, the extra iterations should fail fast. --=20 You are receiving this mail because: You are the assignee for the bug.= --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org