arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiril Menshikov <kmenshi...@gmail.com>
Subject Re: [JAVA] Figuring out whats shifted from Drill/Java
Date Fri, 01 Jul 2016 22:16:56 GMT
Hi,

I’m back from vacation and finalize offset test.

I found that memory is not necessarily 64-bytes align. So now the offset is calculated before
ArrowBuf initialization.

https://github.com/apache/arrow/pull/98 <https://github.com/apache/arrow/pull/98>

-Kiril

> On Jun 16, 2016, at 21:32, Kiril Menshikov <kmenshikov@gmail.com> wrote:
> 
> Yes, I’d rather write the test.
> 
>> On Jun 16, 2016, at 18:59, Jacques Nadeau <jacques@apache.org> wrote:
>> 
>>> Netty buffer always allocate memory aligned to 64-bytes. So each new
>>> ArrowBuf will be aligned to 64-bytes as well, with offset = 0.
>>> 
>> 
>> You confirmed that both the Netty chunk as well as buffer allocations
>> (ArrowBufs returned from here [1]) are on 64-byte offsets? Can you maybe
>> write some tests/add some assertion to the code so we protect against that
>> changing?
>> 
>> 
>>> 
>>> ​I don't fully understand why new allocations should be on 64-bytes
>>> offset?​
>>> 
>> 
>> As part of the Arrow spec, each separate piece of memory must have 64
>> byte-sized-word alignment and 64 byte padding. For example, if you have
>> NullableVarChar, you'll need three buffers: nullable bits, four byte
>> offsets and data buffer. Each of those must be on a 64 byte offset and be a
>> length that is a multiple of 64 bytes.
>> 
>> [1]
>> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BufferAllocator.java#L37
>> 
>> 
>>> 
>>> ​-Kiril​
>>> 
>>> 
>>> 
>>> On Jun 14, 2016, at 00:22, Jacques Nadeau <jacques@apache.org> wrote:
>>> 
>>> Yes, I think there are two main components. Also, I accidentally said 64
>>> bits when I should have said 64 bytes.
>>> 
>>> 1. New allocations should be on 64 byte offsets
>>> 2. Serializing existing vectors must be done such that they are always in
>>> an increment of 64 bytes. This is necessary to avoid copying when sending
>>> across the wire, otherwise the receiver would need to slice up/copy the
>>> incoming datastream. This would be done by ensuring that the setValueCount
>>> and similar operations (capcity) are done at the right range. I'd expect
>>> this second one to be best done on top of Steven's work.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Jun 13, 2016 at 2:14 PM, Kiril Menshikov <kmenshikov@gmail.com>
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> Does this mean that offset must be adjusted depending on the UDLE memory.
>>> So new memory address will be align to 64 bits?
>>> 
>>> 
>>> The first thing we should do for the alignment in Java is adjust the
>>> allocator so that it always allocates on a 64 bit offset. Does someone
>>> 
>>> want
>>> 
>>> to look at that?
>>> 
>>> 
>>> 
>>> Thanks,
>>> -Kiril
>>> 
>>> On Jun 11, 2016, at 22:45, Jacques Nadeau <jacques@apache.org> wrote:
>>> 
>>> Steven is on vacation for a couple of days. His focus as I understand it
>>> 
>>> is
>>> 
>>> rationalizing the code so it is cleaner, correct for arrow versus drill
>>> representation differences (such as decimal, nulls, etc) and has more
>>> 
>>> unit
>>> 
>>> tests. Once he gets back in the next day or two, hopefully he can post a
>>> wip patch.
>>> 
>>> The first thing we should do for the alignment in Java is adjust the
>>> allocator so that it always allocates on a 64 bit offset. Does someone
>>> 
>>> want
>>> 
>>> to look at that?
>>> On Jun 10, 2016 5:35 PM, "Gaurav Agarwal" <gaurav130403@gmail.com>
>>> 
>>> wrote:
>>> 
>>> 
>>> I am also interested on this . Do we need to know drill before start
>>> implementing not a for arrow .
>>> On Jun 10, 2016 9:45 PM, "Wail Alkowaileet" <wael.y.k@gmail.com> wrote:
>>> 
>>> On Wed, Jun 8, 2016 at 9:26 PM, Micah Kornfield <emkornfield@gmail.com
>>> 
>>> 
>>> wrote:
>>> 
>>> Hi Steven,
>>> Is the patch focused on the alignment/padding.  Or are there other
>>> issues as well?
>>> 
>>> 
>>> I'm interested on this as well....
>>> 
>>> 
>>> Thanks,
>>> Micah
>>> 
>>> On Tue, Jun 7, 2016 at 11:22 PM, Steven Phillips <steven@dremio.com>
>>> wrote:
>>> 
>>> I am currently working on a patch that addresses this, as well as
>>> 
>>> removing
>>> 
>>> some of the residual code from Drill that isn't really needed in
>>> 
>>> Arrow,
>>> 
>>> (such as the Drill types, MaterializedField, etc.)
>>> 
>>> I will be posting this within a few days.
>>> 
>>> On Tue, Jun 7, 2016 at 5:54 PM, Leif Walsh <leif.walsh@gmail.com>
>>> 
>>> wrote:
>>> 
>>> 
>>> I am also interested in this.
>>> On Tue, Jun 7, 2016 at 17:37 Holden Karau <holden@pigscanfly.ca>
>>> 
>>> wrote:
>>> 
>>> 
>>> Hi Everyone,
>>> 
>>> I'm looking to help get started with Arrow & Spark and to that end
>>> 
>>> I'd
>>> 
>>> like
>>> 
>>> to start with getting the Java implementation closer to the spec
>>> 
>>> / C
>>> 
>>> implementation. I'm wondering what places people know the
>>> 
>>> differences
>>> 
>>> are
>>> 
>>> between the two?
>>> 
>>> Cheers,
>>> 
>>> Holden :)
>>> 
>>> --
>>> --
>>> Cheers,
>>> Leif
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> *Regards,*
>>> Wail Alkowaileet
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message