asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Wed, 06 Apr 2016 16:21:01 GMT
@Till: do you have any thoughts on this subject before we close it?

On Fri, Apr 1, 2016 at 2:08 PM, Jianfeng Jia <jianfeng.jia@gmail.com> wrote:

> Dear devs,
>
> Xi has implemented the SerializableVector(SVector) which allocated memory
> increasingly rather than a long java int array upfront.
> We did some tests but it seems not a good enough answer to solve the OOM
> issue caused by long java array allocation.
>
> The first experiment was to test if the standalone SVector approach would
> get more memory comparing to the long int array. With the heap size of
> -Xmx2g on a single machine, the SVector could allocate up to 1280M data,
> while the original Java array can allocate up to 1080M data. It can
> allocate about 18% more space.
>
> When we move to the AsterixDB, we replace the int array which stores the
> (FrameId, TupleId) pair with the SVector. Then we gradually increasing the
> "compiler.sortmemory" to increase the in-memory buffer for external sort.
> Both master and SVector succeeded in 1150M case and failed at 1250M case.
> Furthermore, the master branch succeeded in 1200M case but the SVector
> failed in that case.
>
> Given that, we may think that allocate the memory chunk by chunk doesn't
> win too much space comparing to allocate a big one directly.
> I think we need to take the structure memory usage into the spilling case
> calculation, but we may not need an extra structure to chop the memory.
>
> Any insights?
> If you are interested, the code sample Xi wrote is here:
> https://github.com/xixZ/JVMExperiment/tree/master/src/testing
>
>
> On Tue, Feb 2, 2016 at 11:01 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
> wrote:
>
> > Thanks Ted for the elaborate introduction!
> > I did some reading about it. Based on my understanding, the main
> advantage
> > of ValueVector is its column-wise design. In that case, the per-record
> > based metadata, e.g. indexes or hash keys, can be directly added as an
> > column along with the existing record. Thus, the sorting within one row
> > group should be easily addressed. One question still not very clear to me
> > is how to generate the sorted result across several row groups?
> > If you can get somebody to talk more about it that will be great. Thank
> > you!
> >
> > On Feb 1, 2016, at 6:03 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> > So there are several key points for ValueVectors that I can describe, but
> > for the authoritative voice, others would have to speak.
> >
> > The first point is that in Drill at least (and this is not required)
> > ValueVectors are off-heap.  This helps enormously in managing the
> > life-cycle since vectors can be associated with queries and when the
> query
> > ends, all associated vectors can be deallocated quickly.  That also
> allows
> > the memory footprint of Drill to be adjusted both up and down while
> > running.
> >
> > Secondly, ValueVectors are stored column-wise, not record-wise.  Most
> > manipulations do not require copies. Projection simply requires ignoring
> > some columns. New columns can be added without disturbing old ones.
> > Filtering is done using a row selection bitmap. Sorting is often done
> using
> > an index column.
> >
> > The assumption is also that you will have a row group with something
> like a
> > hundred thousand rows in it. This means that the size of a single group
> > isn't usually astronomical although very large data structures in a
> single
> > row can make the regulation of the size of row groups more difficult.
> >
> > Of particular interest is the fact that processing code can be generated
> in
> > Java that avoids almost all object creation so that most SQL-like queries
> > don't require any object cons'ing at all during the row scans. Moreover,
> > the code generated can even be rendered by the JIT into vectorized low
> > level instructions because the access patterns on ValueVectors are so
> > simple.
> >
> > Nested data structures are handled using invisible marking columns
> similar
> > to the way that nesting is marked in Dremel or Parquet. This means that
> you
> > get uniformly typed pseudo columns that represent a flattened view of the
> > nested content. Many restructuring operations can be done by simply
> > re-interpreting the nested data without any copying at all.
> >
> > If more detail is desired we should probably get somebody who is more
> > active in the Drill implementation to talk about how this all works and
> how
> > it will be extracted into Apache Arrow.
> >
> > More information can be found here:
> >
> > https://drill.apache.org/docs/value-vectors/
> >
> >
> >
> >
> > On Mon, Feb 1, 2016 at 5:08 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
> > wrote:
> >
> > If I understand correctly, it seems very similar to the IFrame in Hyrack,
> > which is also a container to store a sequence of record into the
> > ByteBuffer.
> >
> > I’m not clear about how records are manipulated inside the ValueVectors.
> > In Hyracks case, we store the pointer(usually a pair of (frameID,
> > recordID)) in one java array and manipulate the pointers instead of the
> > original records. We mainly want to break the that one array into
> multiple
> > small ByteBuffer-based arrays. By doing so, we can reduce the risk of OOM
> > for a large array and we may also take the memory usage of those pointers
> > into account for the flush decisions.  @Ted, could you share some
> insights
> > about how the ValueVectors handles manipulations? e.g. sort, hashing …
> etc.
> >
> > On Feb 1, 2016, at 3:16 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> > Have you guys looked at the Drill ValueVectors?
> >
> > This structure is being spun out as Apache Arrow with multiple interfaces
> > and language bindings.
> >
> > On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <jianfeng.jia@gmail.com>
> >
> > wrote:
> >
> >
> > Hi,
> >
> > We plan to implement an append-only array at first. The main reason is
> > this is how the auxiliary data structure be used so far. Then the
> > implementation is straightforward.
> >
> > The tree-structured Vector in Scala can save a lot in updating case
> >
> > mainly
> >
> > because of their immutable requirement. It can saving unnecessary data
> > copies comparing to other immutable list when updating. We only allow
> > in-place update. The tree design may be overkill for us.
> >
> > Xi made on detailed design doc is here:
> >
> >
> >
> https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing
> >
> > Any thoughts or comments?
> >
> >
> > On Jan 18, 2016, at 9:52 AM, Chen Li <chenli@gmail.com> wrote:
> >
> > @Xi and Jianfeng: after we come up with the design, let's share it with
> >
> > the
> >
> > group for an approval before the implementation.
> >
> > Chen
> >
> > On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <dtabass@gmail.com>
> >
> > wrote:
> >
> >
> > The accounting is just as critical as the chunking - we should do both
> > (together).
> >
> >
> > On 1/15/16 9:00 AM, Till Westmann wrote:
> >
> > I don’t have relevant experience on the subject. But I think that it
> > sounds good to avoid arbitrarily long chunks of memory. Especially -
> >
> > as
> >
> > Jianfeng wrote - it would be good to be able to a) account for this
> >
> > memory
> >
> > and b) to manage it.
> > An interesting question for me would be what the overhead of such a
> > Vector is compared to a simple Java array and as a result where it
> >
> > should
> >
> > be used to replace arrays. (The comparison in [3] only compares
> >
> > different
> >
> > Scala collections, but doesn’t look at plain arrays.)
> >
> > Cheers,
> > Till
> >
> > On 14 Jan 2016, at 22:05, Chen Li wrote:
> >
> > Before we ask Xi to work on this project, it will be good to know if
> >
> > other people have seen similar problems and agree with this plan.
> > @Till: can you share some tips?
> >
> > Chen
> >
> > On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <
> >
> > jianfeng.jia@gmail.com
> >
> >
> > wrote:
> >
> > Hi Devs,
> >
> > First of all, Xi Zhang is a Master student at UCI wants to work
> >
> > with
> >
> > us
> >
> > for a while. Welcome Xi!
> >
> > We are thinking of making a Frame-based, memory-bound
> > SerializableVector at first. We expect this vector can solve some
> > occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
> > Though we did a good job on organizing the record-located memory,
> >
> > the
> >
> > OOM exception can still happen while operating the auxiliary data
> > structure. For example in the sort run generator, instead of moving
> >
> > record
> >
> > around we are creating an reference “pointer" array to the original
> >
> > record.
> >
> > However, if the record is small and the size of that int array will
> >
> > be
> >
> > large, then the OOM exception will occur, which is the case of
> >
> > issue
> >
> > [1].
> >
> >
> > One way to solve this problem is to put auxiliary data structures
> >
> > into
> >
> > the memory-bounded frame as well. In general, it will be much
> >
> > easier
> >
> > to ask
> >
> > for multiple small memory blocks than one big chunk of memory. I
> >
> > guess that
> >
> > was the same reason why we have “SerializableHashTable” for
> >
> > HashJoin. It
> >
> > will be nice to have a more general structure that can be used by
> >
> > all the
> >
> > operators.
> >
> > The Frame based Vector idea is inspired by the Scala Vector[2]
> >
> > which
> >
> > looks like a List, but internally it is implemented as a 32-ary
> >
> > tree. The
> >
> > performance of it is very stable for variety size of object[3]. It
> >
> > will
> >
> > have all the benefits of ArrayList and the LinkedList. In addition,
> >
> > we can
> >
> > take the memory usage of the auxiliary structure into the
> >
> > calculation. We
> >
> > will work on the detailed design doc later if we are agree on this
> > direction.
> >
> > Any thoughts or suggestions? Thank you!
> >
> >
> > [1]
> >
> >
> >
> >
> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
> >
> > <
> >
> >
> >
> >
> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
> >
> >
> >
> > [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
> > https://bitbucket.org/astrieanna/bitmapped-vector-trie>
> > [3]
> >
> > http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
> >
> > <
> >
> > http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message