incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daisy Guo <daisy.yc...@gmail.com>
Subject Re: HugeData in a Spreadsheet
Date Tue, 08 Nov 2011 09:13:08 GMT
2011/10/29 Rob Weir <robweir@apache.org>:
> On Fri, Oct 28, 2011 at 12:57 AM, Wilson Barbosa
> <wilson.barbosa@gmail.com> wrote:
>> Hi Devin and Rob,
>>
>> this is an adaptation of my original code:
>>
>
> If I'm reading this right, you have 14 sheets, each one with 10,000
> rows and 13 columns of data.  That comes out to nearly 2 million
> cells.
>
> Aside from increasing your heap size, I'm not sure there is anything
> you can do in the current version of the ODF Toolkit to bring this
> down, if you're dealing with that amount of data.
>
> One trick might be to use temp files.  So create 14 temporary ODF
> documents, each with one sheet.  Then write some custom code to
> combine the temp files.  But don't instantiate an XML DOM for the
> final combine step.  Just create a new content.xml that concatenates
> the markup from the temp files.
>
> In the next release we might look at doing String.intern() on cell
> values, and see if that helps.  When a spreadsheet is used as a
> database, there are often many duplicate strings.

I believe we used String.intern() in the current code base. Maybe we
need to figure out a possible way to load a document partially.
>
> -Rob
>
>> 8<----
>> public void teste_table() throws Exception {
>>        int next_sheet = 0;
>>        long next_line = 1;
>>        // Cria objetos da planilha
>>        SpreadsheetDocument planilha = SpreadsheetDocument
>>                .newSpreadsheetDocument();
>>
>>        Table t = planilha.getSheetByIndex(0);
>>        t.setTableName("wks_" + next_sheet);
>>        for (long i = 1; i < 140000; i++) {
>>            t.getCellByPosition("A" +
>> String.valueOf(next_line)).setStringValue(
>>                    "*A******");
>>            t.getCellByPosition("B" +
>> String.valueOf(next_line)).setStringValue(
>>                    "*B******");
>>            t.getCellByPosition("C" +
>> String.valueOf(next_line)).setStringValue("*C*");
>>            t.getCellByPosition("D" +
>> String.valueOf(next_line)).setStringValue("*D*");
>>            t.getCellByPosition("E" +
>> String.valueOf(next_line)).setStringValue("*E*");
>>            t.getCellByPosition("F" +
>> String.valueOf(next_line)).setStringValue("*F*");
>>            t.getCellByPosition("G" +
>> String.valueOf(next_line)).setStringValue("*G*");
>>            t.getCellByPosition("H" +
>> String.valueOf(next_line)).setStringValue("*H*");
>>            t.getCellByPosition("I" +
>> String.valueOf(next_line)).setStringValue("*I*");
>>            t.getCellByPosition("J" +
>> String.valueOf(next_line)).setStringValue("*J*");
>>            t.getCellByPosition("K" +
>> String.valueOf(next_line)).setStringValue("*K*");
>>            t.getCellByPosition("L" +
>> String.valueOf(next_line)).setStringValue("*L*");
>>            t.getCellByPosition("M" +
>> String.valueOf(next_line)).setStringValue("*M*");
>>            //
>>            next_line++;
>>            //
>>            System.out.println(i);
>>            //
>>            if(i % 10000 == 0){
>>                next_sheet++;
>>                next_line = 1;
>>                t = planilha.appendSheet("wks" + next_sheet);
>>            }
>>
>>        }
>>        planilha.save("/tmp/big_worksheet.ods");
>>        planilha.close();
>>    }
>> 8<-------
>>
>> I had some variations, for example using a Cell object to better work with
>> the formatation and data values...
>>
>> I get a problem similar when using a TreeMap and i discovered a jdbm2
>> implementation of TreeMap in disk, but i dont know how to use a view of a
>> SpreadsheetDocument in disk.
>>
>> Thanks in advance,
>> Wilson.
>>
>>
>>
>>
>> 2011/10/25 Devin Han <devinhan@apache.org>
>>
>>> Hi Wilson,
>>>
>>> I test two cases.
>>>
>>> If the <<Logic of Application>> is
>>>            c = t.getCellByPosition(0, i);
>>>            c.setStringValue("i");
>>> There is no exception, everything works well.
>>>
>>> If change to:
>>>            c = t.getCellByPosition(i, i);
>>>            c.setStringValue("i");
>>>            System.out.println(i);
>>> "java.lang.OutOfMemoryError" will be thrown when i = 941.
>>>
>>> So, Logic of Application, please give more info about it.
>>>
>>> 2011/10/24 Wilson Barbosa <wilson.barbosa@gmail.com>
>>>
>>> > Dear All,
>>> >
>>> > I'm newbie using the SimpleODS, im getting in trouble when creating a big
>>> > spreadsheet (about 140k rows). When the sheet if growing the performance
>>> is
>>> > degrated and in a point i got "java.lang.OutOfMemory" exception. There is
>>> a
>>> > way of work in the sheet object like a file?
>>> >
>>> > The example of my code (reduced):
>>> > ...
>>> > // Cria objetos da planilha
>>> > SpreadsheetDocument planilha =
>>> > SpreadsheetDocument.newSpreadsheetDocument();
>>> > Table t = planilha.getSheetByIndex(proxima_sheet);
>>> > t.setTableName("WKS_" + proxima_sheet);
>>> > //
>>> > Cell c = null;
>>> > for(int i = 0; i<65000;i++){
>>> >  <<Logic of Application>>
>>> > }
>>> > ...
>>> >
>>> >
>>> Many thanks.,
>>> > Wilson.
>>> >
>>>
>>>
>>>
>>> --
>>> -Devin
>>>
>>
>

Mime
View raw message