arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ji Liu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ARROW-5259) Add option for ValueVector to allocate buffers with actual size
Date Sun, 05 May 2019 03:52:00 GMT
Ji Liu created ARROW-5259:
-----------------------------

             Summary: Add option for ValueVector to allocate buffers with actual size
                 Key: ARROW-5259
                 URL: https://issues.apache.org/jira/browse/ARROW-5259
             Project: Apache Arrow
          Issue Type: Wish
            Reporter: Ji Liu
            Assignee: Ji Liu


Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the buffer size with
_valueCount_ and _typeWidth_ as inputs and then allocates memory for dataBuffer and validityBuffer.
However, it always allocate memory greater than the actual size, because of the invoke of _BaseAllocator.nextPowerOfTwo(bufferSize)_.

For example, IntVector will allocate buffers with size 8192 with valueCount = 1025, memory
usage is almost double what it actually is. So in some cases, there have enough memory for
actual use but throws OOM when the allocated memory is increased to next power of 2 and I
think this problem is absolutely avoidable.

Is it feasible to add option for ValueVector to allocate actual buffer size rather than make
it next power of 2 to reduce memory allocation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message