poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack of Shadows <somerandomlo...@gmail.com>
Subject Re: SSPerformanceTest: Is the FAQ still accurate?
Date Mon, 18 Apr 2016 14:59:34 GMT
Yeah, no. 4 gigs of RAM per 55000 lines (that's around 75 per row) is not a
balancing act, it's ridiculous :-)
I've ended up using LibreOffice's API instead. The API itself is pretty
ugly and the documentation is scattered between LibreOffice and OpenOffice,
but everything seems to work.
And it uses under 300 MB of RAM. Haha! So long, suckers! :-)

On Tue, Apr 12, 2016 at 3:02 PM, Javen O'Neal <javenoneal@gmail.com> wrote:

> Memory consumption and performance are a balancing act. POI adds data
> structures on top of the XML beans that makes lookups faster, but at
> the cost of duplicating the memory across multiple data structures.
> Until we can read in an OOXML file, write data structures that can
> fully capture the XML content, free the XML beans, and recreate the
> XML beans on write, and do so without corrupting or losing
> information, POI will be a high-memory consumer. Additionally, we're
> using XMLBeans 2.6, an older (discontinued) library that may not be as
> efficient as other XML libraries.
>
> Also consider that other libraries that can read and write Microsoft
> Office files support a different set of features and are
> performance-optimized (with auxillary data structures) in certain
> cases.
>
> I hope that clears up some of the questions/concern you had. Feel free
> to use memory management tools such as Hotspot to figure out where all
> the memory is going (a lot of strings stored in the XML nodes, if I
> remember correctly) and submit patches where you think we could be
> doing a better job on memory consumption without sacrificing
> performance.
>
> On Tue, Apr 12, 2016 at 4:36 AM, Jack of Shadows
> <somerandomlogin@gmail.com> wrote:
> > Yes, that is understandable. However, in my tests memory usage to parse a
> > file with 55000 rows is 1.5 GB -- isn't that a bit too high?
> > I've tested LibXL with the same file -- memory usage is just 240 MB.
> >
> > On Tue, Apr 12, 2016 at 2:09 PM, Murphy, Mark <murphymdev@metalexmfg.com
> >
> > wrote:
> >
> >> XSSF is an XML document. Given that XML is generally about 70-80%
> overhead
> >> vs. data, it is not surprising that binary spreadsheets (which can be
> >> optimized, and have very little overhead) are more memory efficient. In
> >> addition, XML must be parsed, but binary documents can frequently be
> >> accessed using pointers and data structures. That gives the binary
> formats
> >> a performance edge, which can be significant. I'm not sure how Microsoft
> >> handles spreadsheets internally, but maybe they keep an internal binary
> >> format, and then write it to whatever format is requested on save rather
> >> than using an internal XML representation for an XML spreadsheet, which
> I
> >> what POI is doing.
> >>
> >> -----Original Message-----
> >> From: Jack of Shadows [mailto:somerandomlogin@gmail.com]
> >> Sent: Monday, April 11, 2016 7:46 AM
> >> To: POI Users List
> >> Subject: Re: SSPerformanceTest: Is the FAQ still accurate?
> >>
> >> XSSF is basically unusable. 25000 or 50000 isn't that many rows. Memory
> >> consumption is pretty high too.
> >> That's really confusing, I wouldn't have been surprised if HSSF
> performed
> >> poorly -- but it actually works better.
> >> Ohh well, whatever, I guess I'd have to use SXSSF instead.
> >>
> >> On Mon, Apr 11, 2016 at 12:04 AM, Dominik Stadler <
> dominik.stadler@gmx.at>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Not sure which exact machine spec the information in the FAQ is based
> >> > on, maybe there is something that can have quite a big influence on
> >> > runtime of this sample for XSSF, e.g. which actual JDK is used,
> >> Linux/Windows, ... ?!
> >> >
> >> > I did a quick run of it across various versions of POI to see if we
> >> > degraded performance at some point, but for me it rather was always
> >> > this way, i.e. HSSF very quick, SXSSF fairly quick (with being very
> >> > slow in early releases) and XSSF quite a bit slower, maybe we need to
> >> > adjust the FAQ entry some more here to set correct expectations?
> >> >
> >> > (Exact numbers here are not that relevant as I used my 6+ year old
> >> > laptop where I was doing other things at the same time, albeit no CPU
> >> > intensive things, JVM was Sun 6.0, Linux Ubuntu, 25000 rows, 25 cols)
> >> >
> >> >
> >> > latest-2016-04-10:
> >> >
> >> > Elapsed 2 seconds
> >> > Elapsed 15 seconds
> >> > Elapsed 5 seconds
> >> >
> >> >
> >> > 2014-03-22 (the FAQ-Entry was added)
> >> >
> >> > Elapsed 1 seconds
> >> > Elapsed 14 seconds
> >> > Elapsed 3 seconds
> >> >
> >> >
> >> > 3.10:
> >> >
> >> > Elapsed 2 seconds
> >> > Elapsed 14 seconds
> >> > Elapsed 3 seconds
> >> >
> >> >
> >> > 3.9:
> >> >
> >> > Elapsed 1 seconds
> >> > Elapsed 12 seconds
> >> > Elapsed 3 seconds
> >> >
> >> >
> >> > 3.8:
> >> >
> >> > Elapsed 2 seconds
> >> > Elapsed 15 seconds
> >> > Elapsed 3 seconds
> >> >
> >> >
> >> > initial checkin of SSPerformanceTest:
> >> >
> >> > Elapsed 1 seconds
> >> > Elapsed 14 seconds
> >> > Elapsed 47 seconds
> >> >
> >> >
> >> > Dominik.
> >> >
> >> >
> >> >
> >> >
> >> > On Sun, Apr 10, 2016 at 5:59 PM, Jack <somerandomlogin@gmail.com>
> wrote:
> >> >
> >> > > I'm having the exact same issue, I've tracked down this message from
> >> > > StackOverflow.
> >> > > I've tested read performance on two XLS and XLSX with identical
> >> > > content (around 75000 rows, 25 columns).
> >> > > HSSF takes under 5 sec; XSSF takes 15-20 sec.
> >> > >
> >> > > Any idea what is the issue with XSSF performance?
> >> > >
> >> > >
> >> > > On 15.02.2016 17:00, Drew Spencer wrote:
> >> > >
> >> > >> Mike DeHaan <mike <at> mikeandzoya.com> writes:
> >> > >>
> >> > >> As a followup, a user has replied to my stack overflow post with
> >> > >> some
> >> > >>> information that might be helpful in tracking this issue down.
> >> > >>> Here is
> >> > >>>
> >> > >> the
> >> > >>
> >> > >>> link to his post:
> >> > >>>
> >> > >>> http://stackoverflow.com/a/34266795/4471563
> >> > >>>
> >> > >>> I ran the same tests in my environments and came up with similar
> >> > >>>
> >> > >> numbers.
> >> > >>
> >> > >>> -Mike DeHaan
> >> > >>>
> >> > >>> I have also asked the same question. Would love to get an
answer
> >> > >>> to
> >> > this
> >> > >> either way. My similar post on StackOverflow is here:
> >> > >>
> http://stackoverflow.com/questions/34995058/apache-poi-much-quicker
> >> > >> -
> >> > >> using-hssf-than-xssf-what-next
> >> > >>
> >> > >> I received an good answer with the link to the streaming reader,
> >> > >> but unfortunately I don't think I can use it because my code runs
> >> > >> on app engine.
> >> > >>
> >> > >> Thanks to anyone that can help.
> >> > >>
> >> > >> Drew Spencer
> >> > >>
> >> > >>
> >> > >> -------------------------------------------------------------------
> >> > >> -- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For
> >> > >> additional commands, e-mail: user-help@poi.apache.org
> >> > >>
> >> > >>
> >> > >>
> >> > >
> >> > > --------------------------------------------------------------------
> >> > > - To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For
> >> > > additional commands, e-mail: user-help@poi.apache.org
> >> > >
> >> > >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >> For additional commands, e-mail: user-help@poi.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message