arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: regular arrow sync up
Date Fri, 19 Aug 2016 20:27:52 GMT
hi William — you will need to send an email to dev-unsubscribe@arrow.apache.org

On Fri, Aug 19, 2016 at 11:57 AM, William Wood
<willwoood@yahoo.com.invalid> wrote:
> Can someone remove me from this thread, please.
>
> Thanks
>
> Sent from my iPhone
>
>> On Aug 18, 2016, at 8:54 PM, Micah Kornfield <emkornfield@gmail.com> wrote:
>>
>> Thanks Julien for organizing the meeting and taking notes.  I wrote up some
>> initial thoughts on shared memory IPC on
>> https://issues.apache.org/jira/browse/ARROW-263
>>
>> I'll try to flesh out a more concrete spec today/tomorrow.
>>
>> -Micah
>>
>>> On Thu, Aug 18, 2016 at 10:25 AM, Julien Le Dem <julien@dremio.com> wrote:
>>>
>>> My notes: (I'll schedule another one in 2 weeks but people should feel free
>>> to do ad-hoc discussion in the meantime)
>>>
>>> Attendees and their topic of interest for today:
>>> - Micah Kornfield: Dictionary encoding, Reusing dictionaries across record
>>> batches, Shared memory, memory management, releasing memory shared accross
>>> processes
>>> - Wes McKinney: Finalize types (Category, ...), File format RPC format,
>>> IPC
>>> - Julien Le Dem: finalize metadata (RPC, IPC, File), File format
>>> implementation, UDF use case
>>> - Erol: Shared memory across Java and C++ to share large amounts of data
>>>
>>> Arrow IPC:
>>>  - Shared memory:
>>>     - current version doesn’t do Schema negotiation yet.
>>>     - all unit tests reading writing out memory with a predefined schema
>>> and known based address.
>>>     - no dictionary encoding yet.
>>>  - issues to discuss:
>>>    - communicating the base memory address:
>>>       - possibly use RPC for coordination.
>>>    - options for shared memory
>>>      - forking a process: anonymous shared memory implicitly
>>>      - starting a new process. Need to spawn alternate shared memory that
>>> needs to be cleaned up
>>>      - direct memory mapped system call (communicate file name to
>>> subprocess).
>>>  - Action (Micah) create a JIRA to sum this up
>>>
>>> - Memory management:
>>>  - the process producing the data will allocate the memory and pass it
>>> read only. It needs to wait for the consumer to be done to release it.
>>>     - one option is memory mapped file (persistent independent of the
>>> process)
>>>     - each process responsible for its memory. Reader needs to release
>>> memory.
>>>  - mechanism for handling too much memory allocation.
>>>  - In the case of record batches over RPC this is not an issue (memory is
>>> copied over).
>>>
>>>  - RPC transport
>>>     definition of the protocol and how we send message.
>>>  - File transport
>>>
>>> - Dictionary encoding:
>>>    - start simple: simple buffer<int> layout
>>>    - enable extension in the future (v2: bit packing?)
>>>
>>> - Category type:
>>>   - Semantic difference with Dictionary encoded.
>>>   - TODO(Julien): Add Category type in Parquet?
>>>
>>>
>>>> On Thu, Aug 18, 2016 at 9:39 AM, Julien Le Dem <julien@dremio.com>
wrote:
>>>>
>>>> Hi Nicole.
>>>> Can you try again?
>>>> I was accepting you but it did not seem to work.
>>>> Julien
>>>>
>>>> On Thu, Aug 18, 2016 at 9:26 AM, Nicole Nemer <Nicole.Nemer@rms.com>
>>>> wrote:
>>>>
>>>>> I am trying to join and it not letting me inŠ
>>>>> nn
>>>>> ‹
>>>>> Nicole Nemer, PhD
>>>>> Technical Architect/Dev Manager
>>>>>
>>>>> 303-641-3340
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On 8/18/16, 10:00 AM, "Julien Le Dem" <julien@dremio.com> wrote:
>>>>>>
>>>>>> And this is starting now.
>>>>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>>>>>
>>>>>>> On Wed, Aug 17, 2016 at 7:07 PM, Julien Le Dem <julien@dremio.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is the hangout link for tomorrow:
>>>>>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>>>>>>
>>>>>>> I have also added to a google calendar event everyone who replied
to
>>>>>>> that
>>>>>>> thread.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 17, 2016 at 6:12 PM, Wes McKinney <wesmckinn@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> hi folks,
>>>>>>>>
>>>>>>>> Reminder that the Arrow sync is tomorrow morning at 09:00
Pacific
>>>>>>>> (http://timesched.pocoo.org/?date=2016-08-18&tz=pacific-stan
>>>>>>>> dard-time!&range=540,600).
>>>>>>>> I believe Julien will send a public Google hangout link to
the
>>> mailing
>>>>>>>> list for you all to join.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wes
>>>>>>>>
>>>>>>>> On Tue, Aug 16, 2016 at 11:07 AM, Wes McKinney <wesmckinn@gmail.com
>>>>
>>>>>>>> wrote:
>>>>>>>>> +1. If there is demand for an Asia-friendly time we can
change
>>>>> things
>>>>>>>> up from week to week.
>>>>>>>>>
>>>>>>>>>> On Aug 16, 2016, at 10:52 AM, Jacques Nadeau <jacques@apache.org
>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> sounds good
>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 16, 2016 at 10:39 AM, Julien Le Dem
<
>>>>> julien@dremio.com>
>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Based on the feedback I'm proposing Thursday
Aug 18 at 4PM UTC
>>> as
>>>>>>>> the
>>>>>>>> first
>>>>>>>>>>> Arrow sync.
>>>>>>>>>>> That's:
>>>>>>>>>>> - 9AM PDT (San Francisco)
>>>>>>>>>>> - 12PM EDT (New York)
>>>>>>>>>>> - 5PM CET (London)
>>>>>>>>>>> - 6PM CEST (Paris, Berlin)
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 9, 2016 at 6:45 AM, Uwe L. Korn
<uwelk@xhochy.com>
>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +1 for bi-weekly and europeen friendly times:
CET (GMT+1)
>>>>>>>>>>>>
>>>>>>>>>>>>> Am 09.08.2016 um 00:39 schrieb Julien
Le Dem <
>>> julien@dremio.com
>>>>>> :
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also to all who are responding let me
know your timezone as
>>>>> well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 8, 2016 at 3:30 PM, Micah
Kornfield <
>>>>>>>> emkornfield@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds good to me as well.  Biweekly
would be preferred.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Monday, August 8, 2016, Wes
McKinney <
>>> wesmckinn@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> hi Julien -- this sounds like
a good idea, also +1 for
>>>>>>>> bi-weekly.
>>>>>>>> I
>>>>>>>>>>>>>>> will do my best to join when
possible. So far we've mostly
>>>>> been
>>>>>>>>>>>>>>> communicating via pull request,
so I think periodic syncs
>>> will
>>>>>>>> be
>>>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Wes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 8, 2016 at 2:45 PM,
P. Taylor Goetz <
>>>>>>>> ptgoetz@gmail.com
>>>>>>>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My preference would be for
bi-weekly.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Taylor
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Aug 8, 2016, at 5:25
PM, Julien Le Dem <
>>>>> julien@dremio.com
>>>>>>>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>> My experience with Parquet
is that a regular sync up over
>>>>>>>> hangout
>>>>>>>>>>>>>> helps
>>>>>>>>>>>>>>>>> keeping in touch and
staying updated about what everyone
>>> is
>>>>>>>> doing.
>>>>>>>>>>>>>>>>> I was thinking of scheduling
it weekly or bi-weekly.
>>>>>>>>>>>>>>>>> Who would join?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way it goes is first
we do a round table where people
>>>>>>>> introduce
>>>>>>>>>>>>>>>>> themselves an list the
topics they'd like to talk or hear
>>>>>>>> about.
>>>>>>>>>>>>>>>>> That makes the agenda
and we go through it.
>>>>>>>>>>>>>>>>> At the end we send notes
to the mailing list with
>>>>> discussions
>>>>>>>> and
>>>>>>>>>>>>>> action
>>>>>>>>>>>>>>>>> items (for example: open
JIRA, comment on JIRA, review PR,
>>>>>>>> etc).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Julien
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Julien
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Julien
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Julien
>>>>
>>>>
>>>> --
>>>> Julien
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>

Mime
View raw message