asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Implement an SerializableVector in Hyracks
Date Wed, 06 Apr 2016 19:56:36 GMT
Agreed!

On 4/6/16 10:14 AM, Till Westmann wrote:
> Not really. Just talked with Yingyi as well and we agree with Jianfeng 
> that
> taking the data structure memory usage into account is probably the most
> important improvement here.
>
> Cheers,
> Till
>
> On 6 Apr 2016, at 9:21, Chen Li wrote:
>
>> @Till: do you have any thoughts on this subject before we close it?
>>
>> On Fri, Apr 1, 2016 at 2:08 PM, Jianfeng Jia <jianfeng.jia@gmail.com> 
>> wrote:
>>
>>> Dear devs,
>>>
>>> Xi has implemented the SerializableVector(SVector) which allocated 
>>> memory
>>> increasingly rather than a long java int array upfront.
>>> We did some tests but it seems not a good enough answer to solve the 
>>> OOM
>>> issue caused by long java array allocation.
>>>
>>> The first experiment was to test if the standalone SVector approach 
>>> would
>>> get more memory comparing to the long int array. With the heap size of
>>> -Xmx2g on a single machine, the SVector could allocate up to 1280M 
>>> data,
>>> while the original Java array can allocate up to 1080M data. It can
>>> allocate about 18% more space.
>>>
>>> When we move to the AsterixDB, we replace the int array which stores 
>>> the
>>> (FrameId, TupleId) pair with the SVector. Then we gradually 
>>> increasing the
>>> "compiler.sortmemory" to increase the in-memory buffer for external 
>>> sort.
>>> Both master and SVector succeeded in 1150M case and failed at 1250M 
>>> case.
>>> Furthermore, the master branch succeeded in 1200M case but the SVector
>>> failed in that case.
>>>
>>> Given that, we may think that allocate the memory chunk by chunk 
>>> doesn't
>>> win too much space comparing to allocate a big one directly.
>>> I think we need to take the structure memory usage into the spilling 
>>> case
>>> calculation, but we may not need an extra structure to chop the memory.
>>>
>>> Any insights?
>>> If you are interested, the code sample Xi wrote is here:
>>> https://github.com/xixZ/JVMExperiment/tree/master/src/testing
>>>
>>>
>>> On Tue, Feb 2, 2016 at 11:01 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>> wrote:
>>>
>>>> Thanks Ted for the elaborate introduction!
>>>> I did some reading about it. Based on my understanding, the main
>>> advantage
>>>> of ValueVector is its column-wise design. In that case, the per-record
>>>> based metadata, e.g. indexes or hash keys, can be directly added as an
>>>> column along with the existing record. Thus, the sorting within one 
>>>> row
>>>> group should be easily addressed. One question still not very clear 
>>>> to me
>>>> is how to generate the sorted result across several row groups?
>>>> If you can get somebody to talk more about it that will be great. 
>>>> Thank
>>>> you!
>>>>
>>>> On Feb 1, 2016, at 6:03 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>>>
>>>> So there are several key points for ValueVectors that I can 
>>>> describe, but
>>>> for the authoritative voice, others would have to speak.
>>>>
>>>> The first point is that in Drill at least (and this is not required)
>>>> ValueVectors are off-heap.  This helps enormously in managing the
>>>> life-cycle since vectors can be associated with queries and when the
>>> query
>>>> ends, all associated vectors can be deallocated quickly.  That also
>>> allows
>>>> the memory footprint of Drill to be adjusted both up and down while
>>>> running.
>>>>
>>>> Secondly, ValueVectors are stored column-wise, not record-wise.  Most
>>>> manipulations do not require copies. Projection simply requires 
>>>> ignoring
>>>> some columns. New columns can be added without disturbing old ones.
>>>> Filtering is done using a row selection bitmap. Sorting is often done
>>> using
>>>> an index column.
>>>>
>>>> The assumption is also that you will have a row group with something
>>> like a
>>>> hundred thousand rows in it. This means that the size of a single 
>>>> group
>>>> isn't usually astronomical although very large data structures in a
>>> single
>>>> row can make the regulation of the size of row groups more difficult.
>>>>
>>>> Of particular interest is the fact that processing code can be 
>>>> generated
>>> in
>>>> Java that avoids almost all object creation so that most SQL-like 
>>>> queries
>>>> don't require any object cons'ing at all during the row scans. 
>>>> Moreover,
>>>> the code generated can even be rendered by the JIT into vectorized low
>>>> level instructions because the access patterns on ValueVectors are so
>>>> simple.
>>>>
>>>> Nested data structures are handled using invisible marking columns
>>> similar
>>>> to the way that nesting is marked in Dremel or Parquet. This means 
>>>> that
>>> you
>>>> get uniformly typed pseudo columns that represent a flattened view 
>>>> of the
>>>> nested content. Many restructuring operations can be done by simply
>>>> re-interpreting the nested data without any copying at all.
>>>>
>>>> If more detail is desired we should probably get somebody who is more
>>>> active in the Drill implementation to talk about how this all works 
>>>> and
>>> how
>>>> it will be extracted into Apache Arrow.
>>>>
>>>> More information can be found here:
>>>>
>>>> https://drill.apache.org/docs/value-vectors/
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 1, 2016 at 5:08 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>> wrote:
>>>>
>>>> If I understand correctly, it seems very similar to the IFrame in 
>>>> Hyrack,
>>>> which is also a container to store a sequence of record into the
>>>> ByteBuffer.
>>>>
>>>> I’m not clear about how records are manipulated inside the 
>>>> ValueVectors.
>>>> In Hyracks case, we store the pointer(usually a pair of (frameID,
>>>> recordID)) in one java array and manipulate the pointers instead of 
>>>> the
>>>> original records. We mainly want to break the that one array into
>>> multiple
>>>> small ByteBuffer-based arrays. By doing so, we can reduce the risk 
>>>> of OOM
>>>> for a large array and we may also take the memory usage of those 
>>>> pointers
>>>> into account for the flush decisions.  @Ted, could you share some
>>> insights
>>>> about how the ValueVectors handles manipulations? e.g. sort, hashing …
>>> etc.
>>>>
>>>> On Feb 1, 2016, at 3:16 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>>>
>>>> Have you guys looked at the Drill ValueVectors?
>>>>
>>>> This structure is being spun out as Apache Arrow with multiple 
>>>> interfaces
>>>> and language bindings.
>>>>
>>>> On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> We plan to implement an append-only array at first. The main reason is
>>>> this is how the auxiliary data structure be used so far. Then the
>>>> implementation is straightforward.
>>>>
>>>> The tree-structured Vector in Scala can save a lot in updating case
>>>>
>>>> mainly
>>>>
>>>> because of their immutable requirement. It can saving unnecessary data
>>>> copies comparing to other immutable list when updating. We only allow
>>>> in-place update. The tree design may be overkill for us.
>>>>
>>>> Xi made on detailed design doc is here:
>>>>
>>>>
>>>>
>>> https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing

>>>
>>>>
>>>> Any thoughts or comments?
>>>>
>>>>
>>>> On Jan 18, 2016, at 9:52 AM, Chen Li <chenli@gmail.com> wrote:
>>>>
>>>> @Xi and Jianfeng: after we come up with the design, let's share it 
>>>> with
>>>>
>>>> the
>>>>
>>>> group for an approval before the implementation.
>>>>
>>>> Chen
>>>>
>>>> On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <dtabass@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>>
>>>> The accounting is just as critical as the chunking - we should do both
>>>> (together).
>>>>
>>>>
>>>> On 1/15/16 9:00 AM, Till Westmann wrote:
>>>>
>>>> I don’t have relevant experience on the subject. But I think that it
>>>> sounds good to avoid arbitrarily long chunks of memory. Especially -
>>>>
>>>> as
>>>>
>>>> Jianfeng wrote - it would be good to be able to a) account for this
>>>>
>>>> memory
>>>>
>>>> and b) to manage it.
>>>> An interesting question for me would be what the overhead of such a
>>>> Vector is compared to a simple Java array and as a result where it
>>>>
>>>> should
>>>>
>>>> be used to replace arrays. (The comparison in [3] only compares
>>>>
>>>> different
>>>>
>>>> Scala collections, but doesn’t look at plain arrays.)
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On 14 Jan 2016, at 22:05, Chen Li wrote:
>>>>
>>>> Before we ask Xi to work on this project, it will be good to know if
>>>>
>>>> other people have seen similar problems and agree with this plan.
>>>> @Till: can you share some tips?
>>>>
>>>> Chen
>>>>
>>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <
>>>>
>>>> jianfeng.jia@gmail.com
>>>>
>>>>
>>>> wrote:
>>>>
>>>> Hi Devs,
>>>>
>>>> First of all, Xi Zhang is a Master student at UCI wants to work
>>>>
>>>> with
>>>>
>>>> us
>>>>
>>>> for a while. Welcome Xi!
>>>>
>>>> We are thinking of making a Frame-based, memory-bound
>>>> SerializableVector at first. We expect this vector can solve some
>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>>> Though we did a good job on organizing the record-located memory,
>>>>
>>>> the
>>>>
>>>> OOM exception can still happen while operating the auxiliary data
>>>> structure. For example in the sort run generator, instead of moving
>>>>
>>>> record
>>>>
>>>> around we are creating an reference “pointer" array to the original
>>>>
>>>> record.
>>>>
>>>> However, if the record is small and the size of that int array will
>>>>
>>>> be
>>>>
>>>> large, then the OOM exception will occur, which is the case of
>>>>
>>>> issue
>>>>
>>>> [1].
>>>>
>>>>
>>>> One way to solve this problem is to put auxiliary data structures
>>>>
>>>> into
>>>>
>>>> the memory-bounded frame as well. In general, it will be much
>>>>
>>>> easier
>>>>
>>>> to ask
>>>>
>>>> for multiple small memory blocks than one big chunk of memory. I
>>>>
>>>> guess that
>>>>
>>>> was the same reason why we have “SerializableHashTable” for
>>>>
>>>> HashJoin. It
>>>>
>>>> will be nice to have a more general structure that can be used by
>>>>
>>>> all the
>>>>
>>>> operators.
>>>>
>>>> The Frame based Vector idea is inspired by the Scala Vector[2]
>>>>
>>>> which
>>>>
>>>> looks like a List, but internally it is implemented as a 32-ary
>>>>
>>>> tree. The
>>>>
>>>> performance of it is very stable for variety size of object[3]. It
>>>>
>>>> will
>>>>
>>>> have all the benefits of ArrayList and the LinkedList. In addition,
>>>>
>>>> we can
>>>>
>>>> take the memory usage of the auxiliary structure into the
>>>>
>>>> calculation. We
>>>>
>>>> will work on the detailed design doc later if we are agree on this
>>>> direction.
>>>>
>>>> Any thoughts or suggestions? Thank you!
>>>>
>>>>
>>>> [1]
>>>>
>>>>
>>>>
>>>>
>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity

>>>
>>>>
>>>> <
>>>>
>>>>
>>>>
>>>>
>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity

>>>
>>>>
>>>>
>>>>
>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>>> [3]
>>>>
>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ 
>>>>
>>>>
>>>> <
>>>>
>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ 
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>>>>
>>>
>>>
>>> -- 
>>>
>>> -----------------
>>> Best Regards
>>>
>>> Jianfeng Jia
>>> Ph.D. Candidate of Computer Science
>>> University of California, Irvine
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message