asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Thu, 07 Apr 2016 05:27:53 GMT
OK.  Then we will not include Xi's code.  This subject is closed.


On Wed, Apr 6, 2016 at 12:56 PM, Mike Carey <dtabass@gmail.com> wrote:

> Agreed!
>
>
> On 4/6/16 10:14 AM, Till Westmann wrote:
>
>> Not really. Just talked with Yingyi as well and we agree with Jianfeng
>> that
>> taking the data structure memory usage into account is probably the most
>> important improvement here.
>>
>> Cheers,
>> Till
>>
>> On 6 Apr 2016, at 9:21, Chen Li wrote:
>>
>> @Till: do you have any thoughts on this subject before we close it?
>>>
>>> On Fri, Apr 1, 2016 at 2:08 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>> wrote:
>>>
>>> Dear devs,
>>>>
>>>> Xi has implemented the SerializableVector(SVector) which allocated
>>>> memory
>>>> increasingly rather than a long java int array upfront.
>>>> We did some tests but it seems not a good enough answer to solve the OOM
>>>> issue caused by long java array allocation.
>>>>
>>>> The first experiment was to test if the standalone SVector approach
>>>> would
>>>> get more memory comparing to the long int array. With the heap size of
>>>> -Xmx2g on a single machine, the SVector could allocate up to 1280M data,
>>>> while the original Java array can allocate up to 1080M data. It can
>>>> allocate about 18% more space.
>>>>
>>>> When we move to the AsterixDB, we replace the int array which stores the
>>>> (FrameId, TupleId) pair with the SVector. Then we gradually increasing
>>>> the
>>>> "compiler.sortmemory" to increase the in-memory buffer for external
>>>> sort.
>>>> Both master and SVector succeeded in 1150M case and failed at 1250M
>>>> case.
>>>> Furthermore, the master branch succeeded in 1200M case but the SVector
>>>> failed in that case.
>>>>
>>>> Given that, we may think that allocate the memory chunk by chunk doesn't
>>>> win too much space comparing to allocate a big one directly.
>>>> I think we need to take the structure memory usage into the spilling
>>>> case
>>>> calculation, but we may not need an extra structure to chop the memory.
>>>>
>>>> Any insights?
>>>> If you are interested, the code sample Xi wrote is here:
>>>> https://github.com/xixZ/JVMExperiment/tree/master/src/testing
>>>>
>>>>
>>>> On Tue, Feb 2, 2016 at 11:01 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>> wrote:
>>>>
>>>> Thanks Ted for the elaborate introduction!
>>>>> I did some reading about it. Based on my understanding, the main
>>>>>
>>>> advantage
>>>>
>>>>> of ValueVector is its column-wise design. In that case, the per-record
>>>>> based metadata, e.g. indexes or hash keys, can be directly added as an
>>>>> column along with the existing record. Thus, the sorting within one row
>>>>> group should be easily addressed. One question still not very clear to
>>>>> me
>>>>> is how to generate the sorted result across several row groups?
>>>>> If you can get somebody to talk more about it that will be great. Thank
>>>>> you!
>>>>>
>>>>> On Feb 1, 2016, at 6:03 PM, Ted Dunning <ted.dunning@gmail.com>
wrote:
>>>>>
>>>>> So there are several key points for ValueVectors that I can describe,
>>>>> but
>>>>> for the authoritative voice, others would have to speak.
>>>>>
>>>>> The first point is that in Drill at least (and this is not required)
>>>>> ValueVectors are off-heap.  This helps enormously in managing the
>>>>> life-cycle since vectors can be associated with queries and when the
>>>>>
>>>> query
>>>>
>>>>> ends, all associated vectors can be deallocated quickly.  That also
>>>>>
>>>> allows
>>>>
>>>>> the memory footprint of Drill to be adjusted both up and down while
>>>>> running.
>>>>>
>>>>> Secondly, ValueVectors are stored column-wise, not record-wise.  Most
>>>>> manipulations do not require copies. Projection simply requires
>>>>> ignoring
>>>>> some columns. New columns can be added without disturbing old ones.
>>>>> Filtering is done using a row selection bitmap. Sorting is often done
>>>>>
>>>> using
>>>>
>>>>> an index column.
>>>>>
>>>>> The assumption is also that you will have a row group with something
>>>>>
>>>> like a
>>>>
>>>>> hundred thousand rows in it. This means that the size of a single group
>>>>> isn't usually astronomical although very large data structures in a
>>>>>
>>>> single
>>>>
>>>>> row can make the regulation of the size of row groups more difficult.
>>>>>
>>>>> Of particular interest is the fact that processing code can be
>>>>> generated
>>>>>
>>>> in
>>>>
>>>>> Java that avoids almost all object creation so that most SQL-like
>>>>> queries
>>>>> don't require any object cons'ing at all during the row scans.
>>>>> Moreover,
>>>>> the code generated can even be rendered by the JIT into vectorized low
>>>>> level instructions because the access patterns on ValueVectors are so
>>>>> simple.
>>>>>
>>>>> Nested data structures are handled using invisible marking columns
>>>>>
>>>> similar
>>>>
>>>>> to the way that nesting is marked in Dremel or Parquet. This means that
>>>>>
>>>> you
>>>>
>>>>> get uniformly typed pseudo columns that represent a flattened view of
>>>>> the
>>>>> nested content. Many restructuring operations can be done by simply
>>>>> re-interpreting the nested data without any copying at all.
>>>>>
>>>>> If more detail is desired we should probably get somebody who is more
>>>>> active in the Drill implementation to talk about how this all works and
>>>>>
>>>> how
>>>>
>>>>> it will be extracted into Apache Arrow.
>>>>>
>>>>> More information can be found here:
>>>>>
>>>>> https://drill.apache.org/docs/value-vectors/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 1, 2016 at 5:08 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>>> wrote:
>>>>>
>>>>> If I understand correctly, it seems very similar to the IFrame in
>>>>> Hyrack,
>>>>> which is also a container to store a sequence of record into the
>>>>> ByteBuffer.
>>>>>
>>>>> I’m not clear about how records are manipulated inside the
>>>>> ValueVectors.
>>>>> In Hyracks case, we store the pointer(usually a pair of (frameID,
>>>>> recordID)) in one java array and manipulate the pointers instead of the
>>>>> original records. We mainly want to break the that one array into
>>>>>
>>>> multiple
>>>>
>>>>> small ByteBuffer-based arrays. By doing so, we can reduce the risk of
>>>>> OOM
>>>>> for a large array and we may also take the memory usage of those
>>>>> pointers
>>>>> into account for the flush decisions.  @Ted, could you share some
>>>>>
>>>> insights
>>>>
>>>>> about how the ValueVectors handles manipulations? e.g. sort, hashing
…
>>>>>
>>>> etc.
>>>>
>>>>>
>>>>> On Feb 1, 2016, at 3:16 PM, Ted Dunning <ted.dunning@gmail.com>
wrote:
>>>>>
>>>>> Have you guys looked at the Drill ValueVectors?
>>>>>
>>>>> This structure is being spun out as Apache Arrow with multiple
>>>>> interfaces
>>>>> and language bindings.
>>>>>
>>>>> On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> We plan to implement an append-only array at first. The main reason is
>>>>> this is how the auxiliary data structure be used so far. Then the
>>>>> implementation is straightforward.
>>>>>
>>>>> The tree-structured Vector in Scala can save a lot in updating case
>>>>>
>>>>> mainly
>>>>>
>>>>> because of their immutable requirement. It can saving unnecessary data
>>>>> copies comparing to other immutable list when updating. We only allow
>>>>> in-place update. The tree design may be overkill for us.
>>>>>
>>>>> Xi made on detailed design doc is here:
>>>>>
>>>>>
>>>>>
>>>>>
>>>> https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing
>>>>
>>>>>
>>>>> Any thoughts or comments?
>>>>>
>>>>>
>>>>> On Jan 18, 2016, at 9:52 AM, Chen Li <chenli@gmail.com> wrote:
>>>>>
>>>>> @Xi and Jianfeng: after we come up with the design, let's share it with
>>>>>
>>>>> the
>>>>>
>>>>> group for an approval before the implementation.
>>>>>
>>>>> Chen
>>>>>
>>>>> On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <dtabass@gmail.com>
>>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>> The accounting is just as critical as the chunking - we should do both
>>>>> (together).
>>>>>
>>>>>
>>>>> On 1/15/16 9:00 AM, Till Westmann wrote:
>>>>>
>>>>> I don’t have relevant experience on the subject. But I think that it
>>>>> sounds good to avoid arbitrarily long chunks of memory. Especially -
>>>>>
>>>>> as
>>>>>
>>>>> Jianfeng wrote - it would be good to be able to a) account for this
>>>>>
>>>>> memory
>>>>>
>>>>> and b) to manage it.
>>>>> An interesting question for me would be what the overhead of such a
>>>>> Vector is compared to a simple Java array and as a result where it
>>>>>
>>>>> should
>>>>>
>>>>> be used to replace arrays. (The comparison in [3] only compares
>>>>>
>>>>> different
>>>>>
>>>>> Scala collections, but doesn’t look at plain arrays.)
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On 14 Jan 2016, at 22:05, Chen Li wrote:
>>>>>
>>>>> Before we ask Xi to work on this project, it will be good to know if
>>>>>
>>>>> other people have seen similar problems and agree with this plan.
>>>>> @Till: can you share some tips?
>>>>>
>>>>> Chen
>>>>>
>>>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <
>>>>>
>>>>> jianfeng.jia@gmail.com
>>>>>
>>>>>
>>>>> wrote:
>>>>>
>>>>> Hi Devs,
>>>>>
>>>>> First of all, Xi Zhang is a Master student at UCI wants to work
>>>>>
>>>>> with
>>>>>
>>>>> us
>>>>>
>>>>> for a while. Welcome Xi!
>>>>>
>>>>> We are thinking of making a Frame-based, memory-bound
>>>>> SerializableVector at first. We expect this vector can solve some
>>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>>>> Though we did a good job on organizing the record-located memory,
>>>>>
>>>>> the
>>>>>
>>>>> OOM exception can still happen while operating the auxiliary data
>>>>> structure. For example in the sort run generator, instead of moving
>>>>>
>>>>> record
>>>>>
>>>>> around we are creating an reference “pointer" array to the original
>>>>>
>>>>> record.
>>>>>
>>>>> However, if the record is small and the size of that int array will
>>>>>
>>>>> be
>>>>>
>>>>> large, then the OOM exception will occur, which is the case of
>>>>>
>>>>> issue
>>>>>
>>>>> [1].
>>>>>
>>>>>
>>>>> One way to solve this problem is to put auxiliary data structures
>>>>>
>>>>> into
>>>>>
>>>>> the memory-bounded frame as well. In general, it will be much
>>>>>
>>>>> easier
>>>>>
>>>>> to ask
>>>>>
>>>>> for multiple small memory blocks than one big chunk of memory. I
>>>>>
>>>>> guess that
>>>>>
>>>>> was the same reason why we have “SerializableHashTable” for
>>>>>
>>>>> HashJoin. It
>>>>>
>>>>> will be nice to have a more general structure that can be used by
>>>>>
>>>>> all the
>>>>>
>>>>> operators.
>>>>>
>>>>> The Frame based Vector idea is inspired by the Scala Vector[2]
>>>>>
>>>>> which
>>>>>
>>>>> looks like a List, but internally it is implemented as a 32-ary
>>>>>
>>>>> tree. The
>>>>>
>>>>> performance of it is very stable for variety size of object[3]. It
>>>>>
>>>>> will
>>>>>
>>>>> have all the benefits of ArrayList and the LinkedList. In addition,
>>>>>
>>>>> we can
>>>>>
>>>>> take the memory usage of the auxiliary structure into the
>>>>>
>>>>> calculation. We
>>>>>
>>>>> will work on the detailed design doc later if we are agree on this
>>>>> direction.
>>>>>
>>>>> Any thoughts or suggestions? Thank you!
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
>>>>
>>>>>
>>>>> <
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
>>>>
>>>>>
>>>>>
>>>>>
>>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
>>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>>>> [3]
>>>>>
>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
>>>>>
>>>>> <
>>>>>
>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> -----------------
>>>> Best Regards
>>>>
>>>> Jianfeng Jia
>>>> Ph.D. Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message