asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Tue, 02 Feb 2016 06:02:01 GMT
Jianfeng and Xi: eventually please make sure to put the design doc to
the wiki site.

Chen

On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <jianfeng.jia@gmail.com> wrote:
> Hi,
>
> We plan to implement an append-only array at first. The main reason is this is how the
auxiliary data structure be used so far. Then the implementation is straightforward.
>
> The tree-structured Vector in Scala can save a lot in updating case mainly because of
their immutable requirement. It can saving unnecessary data copies comparing to other immutable
list when updating. We only allow in-place update. The tree design may be overkill for us.
>
> Xi made on detailed design doc is here: https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing
> Any thoughts or comments?
>
>
>> On Jan 18, 2016, at 9:52 AM, Chen Li <chenli@gmail.com> wrote:
>>
>> @Xi and Jianfeng: after we come up with the design, let's share it with the
>> group for an approval before the implementation.
>>
>> Chen
>>
>> On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <dtabass@gmail.com> wrote:
>>
>>> The accounting is just as critical as the chunking - we should do both
>>> (together).
>>>
>>>
>>> On 1/15/16 9:00 AM, Till Westmann wrote:
>>>
>>>> I don’t have relevant experience on the subject. But I think that it
>>>> sounds good to avoid arbitrarily long chunks of memory. Especially - as
>>>> Jianfeng wrote - it would be good to be able to a) account for this memory
>>>> and b) to manage it.
>>>> An interesting question for me would be what the overhead of such a
>>>> Vector is compared to a simple Java array and as a result where it should
>>>> be used to replace arrays. (The comparison in [3] only compares different
>>>> Scala collections, but doesn’t look at plain arrays.)
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On 14 Jan 2016, at 22:05, Chen Li wrote:
>>>>
>>>> Before we ask Xi to work on this project, it will be good to know if
>>>>> other people have seen similar problems and agree with this plan.
>>>>> @Till: can you share some tips?
>>>>>
>>>>> Chen
>>>>>
>>>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Devs,
>>>>>>
>>>>>> First of all, Xi Zhang is a Master student at UCI wants to work with
us
>>>>>> for a while. Welcome Xi!
>>>>>>
>>>>>> We are thinking of making a Frame-based, memory-bound
>>>>>> SerializableVector at first. We expect this vector can solve some
>>>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>>>>> Though we did a good job on organizing the record-located memory,
the
>>>>>> OOM exception can still happen while operating the auxiliary data
>>>>>> structure. For example in the sort run generator, instead of moving
record
>>>>>> around we are creating an reference “pointer" array to the original
record.
>>>>>> However, if the record is small and the size of that int array will
be
>>>>>> large, then the OOM exception will occur, which is the case of issue
[1].
>>>>>>
>>>>>> One way to solve this problem is to put auxiliary data structures
into
>>>>>> the memory-bounded frame as well. In general, it will be much easier
to ask
>>>>>> for multiple small memory blocks than one big chunk of memory. I
guess that
>>>>>> was the same reason why we have “SerializableHashTable” for HashJoin.
It
>>>>>> will be nice to have a more general structure that can be used by
all the
>>>>>> operators.
>>>>>>
>>>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which
>>>>>> looks like a List, but internally it is implemented as a 32-ary tree.
The
>>>>>> performance of it is very stable for variety size of object[3]. It
will
>>>>>> have all the benefits of ArrayList and the LinkedList. In addition,
we can
>>>>>> take the memory usage of the auxiliary structure into the calculation.
We
>>>>>> will work on the detailed design doc later if we are agree on this
>>>>>> direction.
>>>>>>
>>>>>> Any thoughts or suggestions? Thank you!
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
>>>>>> <
>>>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
>>>>>>
>>>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
>>>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>>>>> [3]
>>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
>>>>>> <
>>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jianfeng Jia
>>>>>> PhD Candidate of Computer Science
>>>>>> University of California, Irvine
>>>>>>
>>>>>>
>>>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>

Mime
View raw message