incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Weir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ODFTOOLKIT-333) Exception in thread "main" java.lang.OutOfMemoryError
Date Fri, 17 Aug 2012 17:59:38 GMT

    [ https://issues.apache.org/jira/browse/ODFTOOLKIT-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436921#comment-13436921
] 

Rob Weir commented on ODFTOOLKIT-333:
-------------------------------------

Very large spreadsheets are a weakness in ODF.  The overhead of each cell, plus markup, bloats
the file to incredible size.  The ZIP compression brings it down to a small fraction, but
the time and memory required to uncompress is large.  This is an issue for OOXML (.xlsx) as
well.  That was one reason Microsoft also created a binary encoding of spreadsheet files with
an XLSB extension:  http://blogs.msdn.com/b/dmahugh/archive/2006/08/22/712835.aspx

On top of that, the ODF Toolkit is based on a DOM representation of the document.  This makes
it wonderful for random access to various parts of the document.  The developer can access
any cell at any time, can add content then styles, or styles then content, work however they
way.  But this comes with memory overhead.

There is no easy fix here that I can see. But there is one very useful approach we could consider.
 That would be to add another module to the Toolkit, maybe call it ODFSAX or ODFStreamer or
something like that.  As the name suggests, we could do a SAX parse, and instead of instantiating
the entire document we could define event handlers like onHeader(), onFooter(), on onParagraph(),
etc. that would be called, in document order.  This is a more constrained solution -- read-only,
single pass, no random access, and it would require the developer to plan their logic in a
way that fits the single-pass way of looking at the document, but it would performance far,
far better for such uses.
                
> Exception in thread "main" java.lang.OutOfMemoryError
> -----------------------------------------------------
>
>                 Key: ODFTOOLKIT-333
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-333
>             Project: ODF Toolkit
>          Issue Type: Bug
>          Components: odfdom, performance, simple api
>    Affects Versions: 0.8.7, 0.8.8
>            Reporter: Vicente Villegas Larios
>         Attachments: bigFile.ods
>
>
> I have been facing an issue related to "Out of Memory", so I'm trying to read a ODS file
with 1.4 MB and ODF code is throwing the following exception.
> Exception in thread "main" java.lang.OutOfMemoryError
> 	at java.util.Arrays.copyOfRange(Unknown Source)
> 	at java.util.Arrays.copyOf(Unknown Source)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:105)
> 	at org.odftoolkit.odfdom.pkg.StreamHelper.stream(StreamHelper.java:74)
> 	at org.odftoolkit.odfdom.pkg.StreamHelper.transformStream(StreamHelper.java:48)
> 	at org.odftoolkit.odfdom.pkg.OdfPackage.getBytes(OdfPackage.java:1584)
> 	at org.odftoolkit.odfdom.pkg.OdfPackage.getInputStream(OdfPackage.java:1650)
> 	at org.odftoolkit.odfdom.pkg.OdfFileDom.initialize(OdfFileDom.java:137)
> 	at org.odftoolkit.odfdom.dom.OdfContentDom.initialize(OdfContentDom.java:60)
> 	at org.odftoolkit.odfdom.pkg.OdfFileDom.<init>(OdfFileDom.java:87)
> 	at org.odftoolkit.odfdom.dom.OdfContentDom.<init>(OdfContentDom.java:50)
> 	at org.odftoolkit.odfdom.pkg.OdfFileDom.newFileDom(OdfFileDom.java:110)
> 	at org.odftoolkit.odfdom.pkg.OdfPackageDocument.getFileDom(OdfPackageDocument.java:280)
> 	at org.odftoolkit.odfdom.dom.OdfSchemaDocument.getFileDom(OdfSchemaDocument.java:393)
> 	at org.odftoolkit.odfdom.dom.OdfSchemaDocument.getContentDom(OdfSchemaDocument.java:197)
> 	at org.odftoolkit.simple.Document.getContentRoot(Document.java:762)
> 	at org.odftoolkit.simple.SpreadsheetDocument.getContentRoot(SpreadsheetDocument.java:217)
> I know that ODF files are like zip files, so I changed the extension to "zip" and extracted
to a folder, it seems like the folder size is 180MB.
> Beside that I exported the content to a "xls"  file and I used POI to perform the same
operation and seemed to work ok with POI. Seems like ODFtoolkit doesn't has support to read
big files.
> I notice that the content is stored in a XML file, thinking about it, seems like ODF
is using DOM instead of SAX parser. 
> Is there any one who can help me to fix this problem? 
> I'm attaching the "ods" file with the data that throws the out of memory.
> Many thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message