asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Implement an SerializableVector in Hyracks
Date Fri, 15 Jan 2016 18:51:28 GMT
Agreed!

On 15 Jan 2016, at 10:30, Jianfeng Jia wrote:

> Sounds a positive support :-).
> One thing this Vector will sure better than plain array is to support 
> editing. (Update, Insert, Delete, even Append is better)
> It will have some overhead than the plain array as for the random 
> access. How much overhead it introduced will be an interesting 
> experiment to do after we have some initial implementation.
>
>> On Jan 15, 2016, at 9:00 AM, Till Westmann <tillw@apache.org> wrote:
>>
>> I don’t have relevant experience on the subject. But I think that 
>> it sounds good to avoid arbitrarily long chunks of memory. Especially 
>> - as Jianfeng wrote - it would be good to be able to a) account for 
>> this memory and b) to manage it.
>> An interesting question for me would be what the overhead of such a 
>> Vector is compared to a simple Java array and as a result where it 
>> should be used to replace arrays. (The comparison in [3] only 
>> compares different Scala collections, but doesn’t look at plain 
>> arrays.)
>>
>> Cheers,
>> Till
>>
>> On 14 Jan 2016, at 22:05, Chen Li wrote:
>>
>>> Before we ask Xi to work on this project, it will be good to know if
>>> other people have seen similar problems and agree with this plan.
>>> @Till: can you share some tips?
>>>
>>> Chen
>>>
>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia 
>>> <jianfeng.jia@gmail.com> wrote:
>>>> Hi Devs,
>>>>
>>>> First of all, Xi Zhang is a Master student at UCI wants to work 
>>>> with us for a while. Welcome Xi!
>>>>
>>>> We are thinking of making a Frame-based, memory-bound 
>>>> SerializableVector at first. We expect this vector can solve some 
>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>>> Though we did a good job on organizing the record-located memory, 
>>>> the OOM exception can still happen while operating the auxiliary 
>>>> data structure. For example in the sort run generator, instead of 
>>>> moving record around we are creating an reference “pointer" array 
>>>> to the original record. However, if the record is small and the 
>>>> size of that int array will be large, then the OOM exception will 
>>>> occur, which is the case of issue [1].
>>>>
>>>> One way to solve this problem is to put auxiliary data structures 
>>>> into the memory-bounded frame as well. In general, it will be much 
>>>> easier to ask for multiple small memory blocks than one big chunk 
>>>> of memory. I guess that was the same reason why we have 
>>>> “SerializableHashTable” for HashJoin. It will be nice to have a 
>>>> more general structure that can be used by all the operators.
>>>>
>>>> The Frame based Vector idea is inspired by the Scala Vector[2] 
>>>> which looks like a List, but internally it is implemented as a 
>>>> 32-ary tree. The performance of it is very stable for variety size 
>>>> of object[3]. It will have all the benefits of ArrayList and the 
>>>> LinkedList. In addition, we can take the memory usage of the 
>>>> auxiliary structure into the calculation. We will work on the 
>>>> detailed design doc later if we are agree on this direction.
>>>>
>>>> Any thoughts or suggestions? Thank you!
>>>>
>>>>
>>>> [1] 
>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity

>>>> <https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie 
>>>> <https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>>> [3] 
>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ 
>>>> <http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
>>>>
>>>> Best,
>>>>
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine

Mime
View raw message