asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Implement an SerializableVector in Hyracks
Date Fri, 15 Jan 2016 17:00:46 GMT
I don’t have relevant experience on the subject. But I think that it 
sounds good to avoid arbitrarily long chunks of memory. Especially - as 
Jianfeng wrote - it would be good to be able to a) account for this 
memory and b) to manage it.
An interesting question for me would be what the overhead of such a 
Vector is compared to a simple Java array and as a result where it 
should be used to replace arrays. (The comparison in [3] only compares 
different Scala collections, but doesn’t look at plain arrays.)

Cheers,
Till

On 14 Jan 2016, at 22:05, Chen Li wrote:

> Before we ask Xi to work on this project, it will be good to know if
> other people have seen similar problems and agree with this plan.
> @Till: can you share some tips?
>
> Chen
>
> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <jianfeng.jia@gmail.com> 
> wrote:
>> Hi Devs,
>>
>> First of all, Xi Zhang is a Master student at UCI wants to work with 
>> us for a while. Welcome Xi!
>>
>> We are thinking of making a Frame-based, memory-bound 
>> SerializableVector at first. We expect this vector can solve some 
>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>> Though we did a good job on organizing the record-located memory, the 
>> OOM exception can still happen while operating the auxiliary data 
>> structure. For example in the sort run generator, instead of moving 
>> record around we are creating an reference “pointer" array to the 
>> original record. However, if the record is small and the size of that 
>> int array will be large, then the OOM exception will occur, which is 
>> the case of issue [1].
>>
>> One way to solve this problem is to put auxiliary data structures 
>> into the memory-bounded frame as well. In general, it will be much 
>> easier to ask for multiple small memory blocks than one big chunk of 
>> memory. I guess that was the same reason why we have 
>> “SerializableHashTable” for HashJoin. It will be nice to have a 
>> more general structure that can be used by all the operators.
>>
>> The Frame based Vector idea is inspired by the Scala Vector[2] which 
>> looks like a List, but internally it is implemented as a 32-ary tree. 
>> The performance of it is very stable for variety size of object[3]. 
>> It will have all the benefits of ArrayList and the LinkedList. In 
>> addition, we can take the memory usage of the auxiliary structure 
>> into the calculation. We will work on the detailed design doc later 
>> if we are agree on this direction.
>>
>> Any thoughts or suggestions? Thank you!
>>
>>
>> [1] 
>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity

>> <https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie 
>> <https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>> [3] 
>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ 
>> <http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
>>
>> Best,
>>
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>>

Mime
View raw message