poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Stadler <dominik.stad...@gmx.at>
Subject Re: poi 3.13 named cell look-up optimisation
Date Thu, 22 Oct 2015 19:01:18 GMT
Hi,

I think if it is something generally useful and not breaking existing
usage patterns we would be glad to incorporate your work. Albeit
probably only few people work with named cells in documents of that
size.

Depending on how much code is involved, we might need to get some
contribution agreement to ensure the contribution is handed over to
the Apache Software Foundation, ideally you should create a bugzilla
entry to focus discussion there and either paste the changes into the
bugzilla entry or provide a github-fork of the apache-poi github
repository to let us review your proposed improvements.

Dominik.

On Thu, Oct 22, 2015 at 4:18 PM,  <richard.hart@nl.pwc.com> wrote:
>
>
> Hello poi developers,
>
> I have made some modifications to poi 3.13 in order to try to reduce the
> time used to generate an excel workbook. This is based on an application we
> are building that models real-world business organisations.
>
> We have a model of an organisation with 58 units. Each unit has 3 main
> Chart of Accounts where each chart has on average 50 accounts. The model
> ranges over 20 years. This equates to about 3 * 50 * 20 (3000) named cells
> on each of 58 sheets. The generation of the workbook takes 4:47 of which
> about 1 minute is used generating the data being placed into the workbook
> cells. Analysis using VisualVM indicated that a great deal of CPU time was
> being spent in getName(String). Looking at the code it appeared that much
> time is spent in HSSFName.setNameName(String) looking to see it the given
> name is duplicated which is where getName(String) is eventually evoked.
>
> As a test to see if this could be improved, I modified several classes to
> include a Map<sheetNumber, Map<name, NameRecord>>. Thus as a cell name is
> set, a duplicate can be found directly through the map and if not
> duplicate, that name is included as a key in the map along with the sheet
> number it is contained in as a key.
>
> Once implemented, the running time was reduced to 3:14 or about a 30%
> reduction in time needed to build the workbook. Now the next most expensive
> method is getStyle() but the CPU time spent there is 1/10th of what getName
> () was so nothing was done to try to improve it.
>
> My question to this group is, is there interest in the code changes? The
> getName(string) still exists but a user can choose to use getName(String,
> int sheetNumber) to do a fast look-up. It is in our own interest if we did
> not have to maintain a poi project of our own and that the poi development
> group take a look at optimising the named cell look-up. I am willing to
> provide the few modified source files for your examination.
>
>
> Best regards/Met vriendelijke groet,
>
> Richard Hart
> PwC | Senior Architect
> Tel: +31 (0)64 321-8105
> Email: richard.hart@nl.pwc.com
> PricewaterhouseCoopers Accountants N.V. (KvK 34180285)
> Newtonlaan 205 | 3584 BH | P.O. Box 85096 | 3508 AB | Utrecht, The
> Netherlands
>
> ______________________________________________________________________
>
>
>
> 'PwC' is the brand under which PricewaterhouseCoopers Accountants N.V. (Chamber of Commerce
34180285), PricewaterhouseCoopers Belastingadviseurs N.V. (Chamber of Commerce 34180284),
PricewaterhouseCoopers Advisory N.V. (Chamber of Commerce 34180287), PricewaterhouseCoopers
Compliance Services B.V. (Chamber of Commerce 51414406), PricewaterhouseCoopers Pensions,
Actuarial & Insurance Services B.V. (Chamber of Commerce 54226368), PricewaterhouseCoopers
B.V. (Chamber of Commerce 34180289) and other companies operate and provide services.
> These services are governed by General Terms & Conditions ('algemene voorwaarden'),
which include provisions regarding our liability. Purchases by these companies are governed
by General Terms and Conditions of Purchase ('algemene inkoopvoorwaarden').
> At www.pwc.nl more detailed information on these companies is available, including these
General Terms and Conditions and the General Terms and Conditions of Purchase, which have
also been filed at the Amsterdam Chamber of Commerce.
>
> The contents of this e-mail and attachments, if any, is confidential and only intended
for the person(s) to which it is addressed. If you receive this e-mail in error then we kindly
request you to inform the sender thereof immediately, and to delete the e-mail and the attachments
without printing, copying or distributing any of those.
> The publication, copying whole or in part or use or dissemination in any other way of
the e-mail and attachments by others than the intended person(s) is prohibited.
> PwC cannot guarantee the security of electronic communication and is not liable for any
negative consequence of the use of electronic communication, including but not limited to,
damage as a result of in or non-complete delivery or delay in delivery of any e-mail; the
text of the e-mail as sent is decisive.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message