arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: regular arrow sync up
Date Fri, 19 Aug 2016 02:54:35 GMT
Thanks Julien for organizing the meeting and taking notes.  I wrote up some
initial thoughts on shared memory IPC on
https://issues.apache.org/jira/browse/ARROW-263

I'll try to flesh out a more concrete spec today/tomorrow.

-Micah

On Thu, Aug 18, 2016 at 10:25 AM, Julien Le Dem <julien@dremio.com> wrote:

> My notes: (I'll schedule another one in 2 weeks but people should feel free
> to do ad-hoc discussion in the meantime)
>
> Attendees and their topic of interest for today:
>  - Micah Kornfield: Dictionary encoding, Reusing dictionaries across record
> batches, Shared memory, memory management, releasing memory shared accross
> processes
>  - Wes McKinney: Finalize types (Category, ...), File format RPC format,
> IPC
>  - Julien Le Dem: finalize metadata (RPC, IPC, File), File format
> implementation, UDF use case
>  - Erol: Shared memory across Java and C++ to share large amounts of data
>
> Arrow IPC:
>   - Shared memory:
>      - current version doesn’t do Schema negotiation yet.
>      - all unit tests reading writing out memory with a predefined schema
> and known based address.
>      - no dictionary encoding yet.
>   - issues to discuss:
>     - communicating the base memory address:
>        - possibly use RPC for coordination.
>     - options for shared memory
>       - forking a process: anonymous shared memory implicitly
>       - starting a new process. Need to spawn alternate shared memory that
> needs to be cleaned up
>       - direct memory mapped system call (communicate file name to
> subprocess).
>   - Action (Micah) create a JIRA to sum this up
>
>  - Memory management:
>   - the process producing the data will allocate the memory and pass it
> read only. It needs to wait for the consumer to be done to release it.
>      - one option is memory mapped file (persistent independent of the
> process)
>      - each process responsible for its memory. Reader needs to release
> memory.
>   - mechanism for handling too much memory allocation.
>   - In the case of record batches over RPC this is not an issue (memory is
> copied over).
>
>   - RPC transport
>      definition of the protocol and how we send message.
>   - File transport
>
>  - Dictionary encoding:
>     - start simple: simple buffer<int> layout
>     - enable extension in the future (v2: bit packing?)
>
> - Category type:
>    - Semantic difference with Dictionary encoded.
>    - TODO(Julien): Add Category type in Parquet?
>
>
> On Thu, Aug 18, 2016 at 9:39 AM, Julien Le Dem <julien@dremio.com> wrote:
>
> > Hi Nicole.
> > Can you try again?
> > I was accepting you but it did not seem to work.
> > Julien
> >
> > On Thu, Aug 18, 2016 at 9:26 AM, Nicole Nemer <Nicole.Nemer@rms.com>
> > wrote:
> >
> >> I am trying to join and it not letting me inŠ
> >> nn
> >> ‹
> >> Nicole Nemer, PhD
> >> Technical Architect/Dev Manager
> >>
> >> 303-641-3340
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 8/18/16, 10:00 AM, "Julien Le Dem" <julien@dremio.com> wrote:
> >>
> >> >And this is starting now.
> >> >https://plus.google.com/hangouts/_/dremio.com/arrow
> >> >
> >> >On Wed, Aug 17, 2016 at 7:07 PM, Julien Le Dem <julien@dremio.com>
> >> wrote:
> >> >
> >> >> Here is the hangout link for tomorrow:
> >> >> https://plus.google.com/hangouts/_/dremio.com/arrow
> >> >>
> >> >> I have also added to a google calendar event everyone who replied to
> >> >>that
> >> >> thread.
> >> >>
> >> >>
> >> >> On Wed, Aug 17, 2016 at 6:12 PM, Wes McKinney <wesmckinn@gmail.com>
> >> >>wrote:
> >> >>
> >> >>> hi folks,
> >> >>>
> >> >>> Reminder that the Arrow sync is tomorrow morning at 09:00 Pacific
> >> >>> (http://timesched.pocoo.org/?date=2016-08-18&tz=pacific-stan
> >> >>> dard-time!&range=540,600).
> >> >>> I believe Julien will send a public Google hangout link to the
> mailing
> >> >>> list for you all to join.
> >> >>>
> >> >>> Thanks
> >> >>> Wes
> >> >>>
> >> >>> On Tue, Aug 16, 2016 at 11:07 AM, Wes McKinney <wesmckinn@gmail.com
> >
> >> >>> wrote:
> >> >>> > +1. If there is demand for an Asia-friendly time we can change
> >> things
> >> >>> up from week to week.
> >> >>> >
> >> >>> >> On Aug 16, 2016, at 10:52 AM, Jacques Nadeau <jacques@apache.org
> >
> >> >>> wrote:
> >> >>> >>
> >> >>> >> sounds good
> >> >>> >>
> >> >>> >>> On Tue, Aug 16, 2016 at 10:39 AM, Julien Le Dem <
> >> julien@dremio.com>
> >> >>> wrote:
> >> >>> >>>
> >> >>> >>> Based on the feedback I'm proposing Thursday Aug 18
at 4PM UTC
> as
> >> >>>the
> >> >>> first
> >> >>> >>> Arrow sync.
> >> >>> >>> That's:
> >> >>> >>> - 9AM PDT (San Francisco)
> >> >>> >>> - 12PM EDT (New York)
> >> >>> >>> - 5PM CET (London)
> >> >>> >>> - 6PM CEST (Paris, Berlin)
> >> >>> >>>
> >> >>> >>>> On Tue, Aug 9, 2016 at 6:45 AM, Uwe L. Korn <uwelk@xhochy.com>
> >> >>> wrote:
> >> >>> >>>>
> >> >>> >>>> +1 for bi-weekly and europeen friendly times:
CET (GMT+1)
> >> >>> >>>>
> >> >>> >>>>> Am 09.08.2016 um 00:39 schrieb Julien Le Dem
<
> julien@dremio.com
> >> >:
> >> >>> >>>>>
> >> >>> >>>>> Also to all who are responding let me know
your timezone as
> >> well.
> >> >>> >>>>>
> >> >>> >>>>> On Mon, Aug 8, 2016 at 3:30 PM, Micah Kornfield
<
> >> >>> emkornfield@gmail.com
> >> >>> >>>>
> >> >>> >>>>> wrote:
> >> >>> >>>>>
> >> >>> >>>>>> Sounds good to me as well.  Biweekly would
be preferred.
> >> >>> >>>>>>
> >> >>> >>>>>>> On Monday, August 8, 2016, Wes McKinney
<
> wesmckinn@gmail.com>
> >> >>> wrote:
> >> >>> >>>>>>>
> >> >>> >>>>>>> hi Julien -- this sounds like a good
idea, also +1 for
> >> >>>bi-weekly.
> >> >>> I
> >> >>> >>>>>>> will do my best to join when possible.
So far we've mostly
> >> been
> >> >>> >>>>>>> communicating via pull request, so
I think periodic syncs
> will
> >> >>>be
> >> >>> >>>>>>> helpful.
> >> >>> >>>>>>>
> >> >>> >>>>>>> - Wes
> >> >>> >>>>>>>
> >> >>> >>>>>>> On Mon, Aug 8, 2016 at 2:45 PM, P.
Taylor Goetz <
> >> >>> ptgoetz@gmail.com
> >> >>> >>>>>>> <javascript:;>> wrote:
> >> >>> >>>>>>>> +1
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> My preference would be for bi-weekly.
> >> >>> >>>>>>>>
> >> >>> >>>>>>>> -Taylor
> >> >>> >>>>>>>>
> >> >>> >>>>>>>>> On Aug 8, 2016, at 5:25 PM,
Julien Le Dem <
> >> julien@dremio.com
> >> >>> >>>>>>> <javascript:;>> wrote:
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> Hi all,
> >> >>> >>>>>>>>> My experience with Parquet
is that a regular sync up over
> >> >>> hangout
> >> >>> >>>>>> helps
> >> >>> >>>>>>>>> keeping in touch and staying
updated about what everyone
> is
> >> >>> doing.
> >> >>> >>>>>>>>> I was thinking of scheduling
it weekly or bi-weekly.
> >> >>> >>>>>>>>> Who would join?
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> The way it goes is first we
do a round table where people
> >> >>> introduce
> >> >>> >>>>>>>>> themselves an list the topics
they'd like to talk or hear
> >> >>>about.
> >> >>> >>>>>>>>> That makes the agenda and
we go through it.
> >> >>> >>>>>>>>> At the end we send notes to
the mailing list with
> >> discussions
> >> >>> and
> >> >>> >>>>>> action
> >> >>> >>>>>>>>> items (for example: open JIRA,
comment on JIRA, review PR,
> >> >>>etc).
> >> >>> >>>>>>>>>
> >> >>> >>>>>>>>> --
> >> >>> >>>>>>>>> Julien
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> --
> >> >>> >>>>> Julien
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> --
> >> >>> >>> Julien
> >> >>> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Julien
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >Julien
> >>
> >>
> >
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message