asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Fri, 15 Jan 2016 19:48:50 GMT
The accounting is just as critical as the chunking - we should do both 
(together).

On 1/15/16 9:00 AM, Till Westmann wrote:
> I don’t have relevant experience on the subject. But I think that it 
> sounds good to avoid arbitrarily long chunks of memory. Especially - 
> as Jianfeng wrote - it would be good to be able to a) account for this 
> memory and b) to manage it.
> An interesting question for me would be what the overhead of such a 
> Vector is compared to a simple Java array and as a result where it 
> should be used to replace arrays. (The comparison in [3] only compares 
> different Scala collections, but doesn’t look at plain arrays.)
>
> Cheers,
> Till
>
> On 14 Jan 2016, at 22:05, Chen Li wrote:
>
>> Before we ask Xi to work on this project, it will be good to know if
>> other people have seen similar problems and agree with this plan.
>> @Till: can you share some tips?
>>
>> Chen
>>
>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia 
>> <jianfeng.jia@gmail.com> wrote:
>>> Hi Devs,
>>>
>>> First of all, Xi Zhang is a Master student at UCI wants to work with 
>>> us for a while. Welcome Xi!
>>>
>>> We are thinking of making a Frame-based, memory-bound 
>>> SerializableVector at first. We expect this vector can solve some 
>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>> Though we did a good job on organizing the record-located memory, 
>>> the OOM exception can still happen while operating the auxiliary 
>>> data structure. For example in the sort run generator, instead of 
>>> moving record around we are creating an reference “pointer" array to 
>>> the original record. However, if the record is small and the size of 
>>> that int array will be large, then the OOM exception will occur, 
>>> which is the case of issue [1].
>>>
>>> One way to solve this problem is to put auxiliary data structures 
>>> into the memory-bounded frame as well. In general, it will be much 
>>> easier to ask for multiple small memory blocks than one big chunk of 
>>> memory. I guess that was the same reason why we have 
>>> “SerializableHashTable” for HashJoin. It will be nice to have a more 
>>> general structure that can be used by all the operators.
>>>
>>> The Frame based Vector idea is inspired by the Scala Vector[2] which 
>>> looks like a List, but internally it is implemented as a 32-ary 
>>> tree. The performance of it is very stable for variety size of 
>>> object[3]. It will have all the benefits of ArrayList and the 
>>> LinkedList. In addition, we can take the memory usage of the 
>>> auxiliary structure into the calculation. We will work on the 
>>> detailed design doc later if we are agree on this direction.
>>>
>>> Any thoughts or suggestions? Thank you!
>>>
>>>
>>> [1] 
>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity

>>> <https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>

>>>
>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie 
>>> <https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>> [3] 
>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ 
>>> <http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>

>>>
>>>
>>> Best,
>>>
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message