orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renat Valiullin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ORC-458) [C++] Redesign of ColumnVectorBatch/ColumnWriter
Date Thu, 10 Jan 2019 01:42:00 GMT
Renat Valiullin created ORC-458:

             Summary: [C++] Redesign of ColumnVectorBatch/ColumnWriter 
                 Key: ORC-458
                 URL: https://issues.apache.org/jira/browse/ORC-458
             Project: ORC
          Issue Type: Improvement
          Components: C++
            Reporter: Renat Valiullin

Current implementation is not convenient for nested types and has memory overhead since
we have to construct whole batch before add it to the writer.

Will be better add to the each batch link to its ColumnWriter to allow possibility to flush data
when batch is full:

listBatch = writer->createRowBatch(batchSize); // create batch tree

elementsBatch = listBatch->elements.get();

for (array : arrays) {

    for (element: array) {

        if (elementsBatch.size == batchSize) elementsBatch.add(); // reset batch size
to 0

        elementsBatch.data[elementsBatch.size++] = element;


    if (listBatch.size == batchSize) listBatch.add();

    listBatch.data[listBatch.size++] = array.size; // sizes, not offsets


writer->add(listBatch); // writeStripe() if needed

This message was sent by Atlassian JIRA

View raw message