poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack of Shadows <somerandomlo...@gmail.com>
Subject Re: SSPerformanceTest: Is the FAQ still accurate?
Date Tue, 12 Apr 2016 11:36:42 GMT
Yes, that is understandable. However, in my tests memory usage to parse a
file with 55000 rows is 1.5 GB -- isn't that a bit too high?
I've tested LibXL with the same file -- memory usage is just 240 MB.

On Tue, Apr 12, 2016 at 2:09 PM, Murphy, Mark <murphymdev@metalexmfg.com>
wrote:

> XSSF is an XML document. Given that XML is generally about 70-80% overhead
> vs. data, it is not surprising that binary spreadsheets (which can be
> optimized, and have very little overhead) are more memory efficient. In
> addition, XML must be parsed, but binary documents can frequently be
> accessed using pointers and data structures. That gives the binary formats
> a performance edge, which can be significant. I'm not sure how Microsoft
> handles spreadsheets internally, but maybe they keep an internal binary
> format, and then write it to whatever format is requested on save rather
> than using an internal XML representation for an XML spreadsheet, which I
> what POI is doing.
>
> -----Original Message-----
> From: Jack of Shadows [mailto:somerandomlogin@gmail.com]
> Sent: Monday, April 11, 2016 7:46 AM
> To: POI Users List
> Subject: Re: SSPerformanceTest: Is the FAQ still accurate?
>
> XSSF is basically unusable. 25000 or 50000 isn't that many rows. Memory
> consumption is pretty high too.
> That's really confusing, I wouldn't have been surprised if HSSF performed
> poorly -- but it actually works better.
> Ohh well, whatever, I guess I'd have to use SXSSF instead.
>
> On Mon, Apr 11, 2016 at 12:04 AM, Dominik Stadler <dominik.stadler@gmx.at>
> wrote:
>
> > Hi,
> >
> > Not sure which exact machine spec the information in the FAQ is based
> > on, maybe there is something that can have quite a big influence on
> > runtime of this sample for XSSF, e.g. which actual JDK is used,
> Linux/Windows, ... ?!
> >
> > I did a quick run of it across various versions of POI to see if we
> > degraded performance at some point, but for me it rather was always
> > this way, i.e. HSSF very quick, SXSSF fairly quick (with being very
> > slow in early releases) and XSSF quite a bit slower, maybe we need to
> > adjust the FAQ entry some more here to set correct expectations?
> >
> > (Exact numbers here are not that relevant as I used my 6+ year old
> > laptop where I was doing other things at the same time, albeit no CPU
> > intensive things, JVM was Sun 6.0, Linux Ubuntu, 25000 rows, 25 cols)
> >
> >
> > latest-2016-04-10:
> >
> > Elapsed 2 seconds
> > Elapsed 15 seconds
> > Elapsed 5 seconds
> >
> >
> > 2014-03-22 (the FAQ-Entry was added)
> >
> > Elapsed 1 seconds
> > Elapsed 14 seconds
> > Elapsed 3 seconds
> >
> >
> > 3.10:
> >
> > Elapsed 2 seconds
> > Elapsed 14 seconds
> > Elapsed 3 seconds
> >
> >
> > 3.9:
> >
> > Elapsed 1 seconds
> > Elapsed 12 seconds
> > Elapsed 3 seconds
> >
> >
> > 3.8:
> >
> > Elapsed 2 seconds
> > Elapsed 15 seconds
> > Elapsed 3 seconds
> >
> >
> > initial checkin of SSPerformanceTest:
> >
> > Elapsed 1 seconds
> > Elapsed 14 seconds
> > Elapsed 47 seconds
> >
> >
> > Dominik.
> >
> >
> >
> >
> > On Sun, Apr 10, 2016 at 5:59 PM, Jack <somerandomlogin@gmail.com> wrote:
> >
> > > I'm having the exact same issue, I've tracked down this message from
> > > StackOverflow.
> > > I've tested read performance on two XLS and XLSX with identical
> > > content (around 75000 rows, 25 columns).
> > > HSSF takes under 5 sec; XSSF takes 15-20 sec.
> > >
> > > Any idea what is the issue with XSSF performance?
> > >
> > >
> > > On 15.02.2016 17:00, Drew Spencer wrote:
> > >
> > >> Mike DeHaan <mike <at> mikeandzoya.com> writes:
> > >>
> > >> As a followup, a user has replied to my stack overflow post with
> > >> some
> > >>> information that might be helpful in tracking this issue down.
> > >>> Here is
> > >>>
> > >> the
> > >>
> > >>> link to his post:
> > >>>
> > >>> http://stackoverflow.com/a/34266795/4471563
> > >>>
> > >>> I ran the same tests in my environments and came up with similar
> > >>>
> > >> numbers.
> > >>
> > >>> -Mike DeHaan
> > >>>
> > >>> I have also asked the same question. Would love to get an answer
> > >>> to
> > this
> > >> either way. My similar post on StackOverflow is here:
> > >> http://stackoverflow.com/questions/34995058/apache-poi-much-quicker
> > >> -
> > >> using-hssf-than-xssf-what-next
> > >>
> > >> I received an good answer with the link to the streaming reader,
> > >> but unfortunately I don't think I can use it because my code runs
> > >> on app engine.
> > >>
> > >> Thanks to anyone that can help.
> > >>
> > >> Drew Spencer
> > >>
> > >>
> > >> -------------------------------------------------------------------
> > >> -- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For
> > >> additional commands, e-mail: user-help@poi.apache.org
> > >>
> > >>
> > >>
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For
> > > additional commands, e-mail: user-help@poi.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message