asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Mon, 01 Feb 2016 23:16:05 GMT
Have you guys looked at the Drill ValueVectors?

This structure is being spun out as Apache Arrow with multiple interfaces
and language bindings.

On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <jianfeng.jia@gmail.com> wrote:

> Hi,
>
> We plan to implement an append-only array at first. The main reason is
> this is how the auxiliary data structure be used so far. Then the
> implementation is straightforward.
>
> The tree-structured Vector in Scala can save a lot in updating case mainly
> because of their immutable requirement. It can saving unnecessary data
> copies comparing to other immutable list when updating. We only allow
> in-place update. The tree design may be overkill for us.
>
> Xi made on detailed design doc is here:
> https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing
> Any thoughts or comments?
>
>
> > On Jan 18, 2016, at 9:52 AM, Chen Li <chenli@gmail.com> wrote:
> >
> > @Xi and Jianfeng: after we come up with the design, let's share it with
> the
> > group for an approval before the implementation.
> >
> > Chen
> >
> > On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <dtabass@gmail.com> wrote:
> >
> >> The accounting is just as critical as the chunking - we should do both
> >> (together).
> >>
> >>
> >> On 1/15/16 9:00 AM, Till Westmann wrote:
> >>
> >>> I don’t have relevant experience on the subject. But I think that it
> >>> sounds good to avoid arbitrarily long chunks of memory. Especially - as
> >>> Jianfeng wrote - it would be good to be able to a) account for this
> memory
> >>> and b) to manage it.
> >>> An interesting question for me would be what the overhead of such a
> >>> Vector is compared to a simple Java array and as a result where it
> should
> >>> be used to replace arrays. (The comparison in [3] only compares
> different
> >>> Scala collections, but doesn’t look at plain arrays.)
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On 14 Jan 2016, at 22:05, Chen Li wrote:
> >>>
> >>> Before we ask Xi to work on this project, it will be good to know if
> >>>> other people have seen similar problems and agree with this plan.
> >>>> @Till: can you share some tips?
> >>>>
> >>>> Chen
> >>>>
> >>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <jianfeng.jia@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Hi Devs,
> >>>>>
> >>>>> First of all, Xi Zhang is a Master student at UCI wants to work
with
> us
> >>>>> for a while. Welcome Xi!
> >>>>>
> >>>>> We are thinking of making a Frame-based, memory-bound
> >>>>> SerializableVector at first. We expect this vector can solve some
> >>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
> >>>>> Though we did a good job on organizing the record-located memory,
the
> >>>>> OOM exception can still happen while operating the auxiliary data
> >>>>> structure. For example in the sort run generator, instead of moving
> record
> >>>>> around we are creating an reference “pointer" array to the original
> record.
> >>>>> However, if the record is small and the size of that int array will
> be
> >>>>> large, then the OOM exception will occur, which is the case of issue
> [1].
> >>>>>
> >>>>> One way to solve this problem is to put auxiliary data structures
> into
> >>>>> the memory-bounded frame as well. In general, it will be much easier
> to ask
> >>>>> for multiple small memory blocks than one big chunk of memory. I
> guess that
> >>>>> was the same reason why we have “SerializableHashTable” for
> HashJoin. It
> >>>>> will be nice to have a more general structure that can be used by
> all the
> >>>>> operators.
> >>>>>
> >>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which
> >>>>> looks like a List, but internally it is implemented as a 32-ary
> tree. The
> >>>>> performance of it is very stable for variety size of object[3].
It
> will
> >>>>> have all the benefits of ArrayList and the LinkedList. In addition,
> we can
> >>>>> take the memory usage of the auxiliary structure into the
> calculation. We
> >>>>> will work on the detailed design doc later if we are agree on this
> >>>>> direction.
> >>>>>
> >>>>> Any thoughts or suggestions? Thank you!
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
> >>>>> <
> >>>>>
> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
> >
> >>>>>
> >>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
> >>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie>
> >>>>> [3]
> >>>>>
> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
> >>>>> <
> >>>>>
> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Jianfeng Jia
> >>>>> PhD Candidate of Computer Science
> >>>>> University of California, Irvine
> >>>>>
> >>>>>
> >>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message