poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Barton2 <mark.bar...@redwood.com>
Subject RE: Large Reports
Date Thu, 23 Apr 2009 19:11:00 GMT

A similar question:
I want to simply extract the text from a .xlsx file.  I'm reading from a
rather sparse xlsx file.  It is a 20 meg file, using a JVM with 990 megs,
with POIXMLTextExtractor.  I get Out of Memory.  What do you recommend?
Thanks,
Mark Barton 


John Borys wrote:
> 
> Thank you.  This is a tremendous help.  Your time is greatly appreciated.
> 
> John K. Borys
> Projects & Financial Controls
> Desk: 312.930.3134
> John.Borys@cmegroup.com
> 
> CME Group
> A CME/Chicago Board of Trade Company
> 20 S. Wacker
> Chicago, Illinois 60606
> http://www.cmegroup.com/
> 
> -----Original Message-----
> From: Yegor Kozlov [mailto:yegor@dinom.ru]
> Sent: Sunday, December 07, 2008 8:24 AM
> To: POI Users List
> Subject: Re: Large Reports
> 
> I created an example demonstrating how to generate large workbooks and
> avoid OutOfMemory:
> http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java
> 
> It works as I suggested:
>   1. creates a template workbook
>   2. generates a sample XML with random data. It can be a really large XML
> with millions of rows and thousands of columns.
>   3. substitutes the sheet in the template with the generated xml
> 
> It's not a ready-to-use API, rather a proof of the concept. To use more
> advanced features (merged cells, custom height
> or width of rows and cells, conditional formats, etc.) you will need to
> study the SpreadsheetML documentation and
> enhance the demo.
> 
> Yegor
> 
>> Yegor,
>>
>> Thank you for your excellent response.  I figured I would have to resort
>> to something like this.
>>
>> You mentioned  I would need to "create a template file using poi-ooxml".
>>
>> How do I do this?  Are there any documentation or tutorials on the
>> subject?
>>
>> John K. Borys
>> Projects & Financial Controls
>> Desk: 312.930.3134
>> John.Borys@cmegroup.com
>>
>> CME Group
>> A CME/Chicago Board of Trade Company
>> 20 S. Wacker
>> Chicago, Illinois 60606
>> http://www.cmegroup.com/
>>
>>
>> -----Original Message-----
>> From: Yegor Kozlov [mailto:yegor@dinom.ru]
>> Sent: Tuesday, December 02, 2008 10:54 AM
>> To: POI Users List
>> Subject: Re: Large Reports
>>
>> Unfortunately poi-ooxml has quite a good appetite for memory, under same
>> conditions you will get OutOfMemory on  less
>> number of rows, I would say less in 2x. Increasing JVML heap will help
>> till a certain limit, if you allocate 2 GB (the
>> limit for 32-bit JVM), you will be able to generate 100K but not 1
>> million of rows.
>>
>> The memory requirement depends of the row-cell grid density. Sparse rows
>> require less memory then rows with every cell set.
>>
>> If you need to generate such large worksheets, I would recommend direct
>> streaming in XML.
>>
>> The approach would be to create a template file using poi-ooxml, Setup
>> sheets, number formats, cell styles, etc.
>> Then write a custom application that streams data in a text file. You
>> don't need a deep knowledge of SpreadsheetML
>> format for that, just follow the pattern in the template. The final step
>> would be to inject this file in the template.
>>
>> It's not very trivial but should be possible.
>>
>> Regards,
>> Yegor
>>
>>> I have been tasked with generating Enterprise Reports and writing them
>>> to Excel Spreadsheets.  When using POI, the program crashes after about
>>> 30,000 records are processed.  Our system requires millions of records
>>> to be processed.  POI's limit is 65K rows and some change.  Excel 2007
>>> is now capable of processing over a million rows of data.  Is there a
>>> tool available either open source or for purchase that can handle
>>> writing large quantities of data (over 1 million rows) to an Excel
>>> spreadsheet?  This data will in turn be used to populate pivot tables or
>>> be merged with an Excel Template.
>>>
>>> John K. Borys
>>> Projects & Financial Controls
>>> Desk: 312.930.3134
>>> John.Borys@cmegroup.com<mailto:John.Borys@cmegroup.com>
>>>
>>> CME Group
>>> A CME/Chicago Board of Trade Company
>>> 20 S. Wacker
>>> Chicago, Illinois 60606
>>> http://www.cmegroup.com/
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Large-Reports-tp20795592p23197679.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message