asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Fri, 15 Jan 2016 06:05:14 GMT
Before we ask Xi to work on this project, it will be good to know if
other people have seen similar problems and agree with this plan.
@Till: can you share some tips?

Chen

On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <jianfeng.jia@gmail.com> wrote:
> Hi Devs,
>
> First of all, Xi Zhang is a Master student at UCI wants to work with us for a while.
Welcome Xi!
>
> We are thinking of making a Frame-based, memory-bound SerializableVector at first. We
expect this vector can solve some occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
> Though we did a good job on organizing the record-located memory, the OOM exception can
still happen while operating the auxiliary data structure. For example in the sort run generator,
instead of moving record around we are creating an reference “pointer" array to the original
record. However, if the record is small and the size of that int array will be large, then
the OOM exception will occur, which is the case of issue [1].
>
> One way to solve this problem is to put auxiliary data structures into the memory-bounded
frame as well. In general, it will be much easier to ask for multiple small memory blocks
than one big chunk of memory. I guess that was the same reason why we have “SerializableHashTable”
for HashJoin. It will be nice to have a more general structure that can be used by all the
operators.
>
> The Frame based Vector idea is inspired by the Scala Vector[2] which looks like a List,
but internally it is implemented as a 32-ary tree. The performance of it is very stable for
variety size of object[3]. It will have all the benefits of ArrayList and the LinkedList.
In addition, we can take the memory usage of the auxiliary structure into the calculation.
We will work on the detailed design doc later if we are agree on this direction.
>
> Any thoughts or suggestions? Thank you!
>
>
> [1] https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
<https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <https://bitbucket.org/astrieanna/bitmapped-vector-trie>
> [3] http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ <http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>

Mime
View raw message