arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Robinson <he...@cloudera.com>
Subject Re: Question about mutability
Date Thu, 25 Feb 2016 20:05:50 GMT
On 25 February 2016 at 11:57, Todd Lipcon <todd@cloudera.com> wrote:

> On Thu, Feb 25, 2016 at 11:48 AM, Henry Robinson <henry@cloudera.com>
> wrote:
> > It seems like Arrow would benefit from a complementary effort to define a
> > (simple) streaming memory transfer protocol between processes. Although
> Wes
> > mentioned RPC earlier, I'd hope that's really a placeholder for "fast
> > inter-process message passing", since the overheads of RPC as usually
> > realised would likely be very significant, even just for control
> messages.
> >
>
> I'm a little skeptical of this as being completely generalizable, at
> least in the context of storage systems. For example, in Kudu, our
> scanner responses aren't just tabular data, but also contain various
> bits of control info that are very Kudu-specific. Our scanner requests
> are super Kudu-specific, with stuff like predicate pushdown, tablet
> IDs, etc.
>

The way I'm thinking about is that someone upstream makes a Kudu-specific
request, but as part of that request provides a descriptor of a shared
ring-buffer. Reading Arrow batches from and writing to that buffer is part
of a simple standard protocol. I agree there's little justification for a
general "request data from data source" protocol.

Or does Kudu have very specific stuff in the "get next batch" API?


>
> For the use case of external UDFs or UDTFs, I think there's good
> opportunity to standardize, though - that interface is a much more
> "pure functional interface" where we can reasonably expect everyone to
> conform to the same protocol.
>
> Regarding RPC overhead, so long as the batches are reasonable size, I
> think it could be well amortized. Kudu's RPC system can do 300k
> RPCs/sec or more on localhost, so if each one of those RPCs
> corresponds to an 8MB data buffer, that's 2.4TB/sec of bandwidth
> (about 100x more than RAM bandwidth). Or put another way, with 8MB
> batches, assuming 24GB/sec RAM bandwidth, you only need 3k round
> trips/second to saturate the bus. Even a super slow RPC control plane
> should be able to handle that in its sleep.
>

Fair enough. What about the latency of those RPCs?


>
>
> > This may just mean a common way to access a ring buffer with a defined
> > in-memory layout and a way of signalling "buffer no longer empty".
> Without
> > doing this in a standardised way, we'd run the risk of N^2 communication
> > protocols between N different analytics systems.
> >
> > Such a thing could also potentially be used for RDMA-based signalling
> that
> > avoids the CPU, as per this paper from MSR:
> > https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic
>
> Agreed that RDMA is a very interesting thing to consider especially as
> next gen networking systems start to become more commo. I'll check out
> that paper.
>

It's good! So is the follow-on one in SOSP (
http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/227-dragojevic.pdf)
but it's less relevant to this question.


>
> -Todd
>
> >
> >
> >
> >>
> >> Another difficulty is security -- with a format like Arrow that embeds
> >> internal pointers, any process following the pointers needs to either
> >> trust the other processes with write access to the segment, or needs
> >> to carefully verify any pointers before following them. Otherwise,
> >> it's pretty easy to crash the reader if you're a malicious writer. In
> >> the case of using arrow from a shared daemon (eg impalad) to access an
> >> untrusted UDF (eg python running in a setuid task container), this is
> >> a real concern IMO.
> >>
> >> -Todd
> >>
> >>
> >>
> >> On Thu, Feb 25, 2016 at 8:37 AM, Wes McKinney <wes@cloudera.com> wrote:
> >> > hi Leif -- you've articulated almost exactly my vision for pandas
> >> > interoperability with Spark via Arrow. There are some questions to
> sort
> >> > out, like shared memory / mmap management and security / sandboxing
> >> > questions, but in general moving toward a model where RPC's contain
> >> shared
> >> > memory offsets to Arrow data rather than shipping large chunks of
> >> > serialized data through stdin/stdout would be a huge leap forward.
> >> >
> >> > On Wed, Feb 24, 2016 at 4:26 PM, Leif Walsh <leif.walsh@gmail.com>
> >> wrote:
> >> >
> >> >> Here's the image that popped into my mind when I heard about this
> >> project,
> >> >> at best it's a motivating example, at worst it's a distraction:
> >> >>
> >> >> 1. Spark reads parquet from wherever into an arrow structure in
> shared
> >> >> memory.
> >> >> 2. Spark executor calls into the Python half of pyspark with a
> handle to
> >> >> this memory.
> >> >> 3. Pandas computes some summary over this data in place, produces
> some
> >> >> smaller result set, also in memory, also in arrow format.
> >> >> 4. Spark driver collects results (without ser/de overheads, just
> >> straight
> >> >> byte buffer reads across the network), concatenates them in memory.
> >> >> 5. Spark driver hands that memory to local pandas, which does more
> >> >> computation over that in place.
> >> >>
> >> >> This is what got me excited, it kills a lot of the spark overheads
> >> related
> >> >> to data (and for a kicker, also enables full pandas compatibility
> along
> >> >> with statsmodels/scikit/etc in step 3). I suppose if this is not a
> >> vision
> >> >> the arrow secs have, I'd be interested in how. Not looking for any
> >> promises
> >> >> yet though.
> >> >> On Wed, Feb 24, 2016 at 14:52 Corey Nolet <cjnolet@apache.org>
> wrote:
> >> >>
> >> >> > So far, how does the integration with the Spark project look?
Do
> you
> >> >> > envision cached Spark partitions allocating Arrows? I could imagine
> >> this
> >> >> > would be absolutely huge for being able to ask questions of
> real-time
> >> >> data
> >> >> > sets across applications.
> >> >> >
> >> >> > On Wed, Feb 24, 2016 at 2:49 PM, Zhe Zhang <zhz@apache.org>
wrote:
> >> >> >
> >> >> > > Thanks for the insights Jacques. Interesting to learn the
> thoughts
> >> on
> >> >> > > zero-copy sharing.
> >> >> > >
> >> >> > > mmap allows sharing address spaces via filesystem interface.
That
> >> has
> >> >> > some
> >> >> > > security concerns but with the immutability restrictions
(as you
> >> >> > clarified
> >> >> > > here), it sounds feasible. What other options do you have
in
> mind?
> >> >> > >
> >> >> > > On Wed, Feb 24, 2016 at 11:30 AM Corey Nolet <cjnolet@gmail.com>
> >> >> wrote:
> >> >> > >
> >> >> > > > Agreed,
> >> >> > > >
> >> >> > > > I thought the whole purpose was to share the memory
space
> (using
> >> >> > possibly
> >> >> > > > unsafe operations like ByteBuffers) so that it could
be
> directly
> >> >> shared
> >> >> > > > without copy. My interest in this is to have it enable
fully
> >> >> in-memory
> >> >> > > > computation. Not just "processing" as in Spark, but
as a fully
> >> >> > in-memory
> >> >> > > > datastore that one application can expose and share
with others
> >> (e.g.
> >> >> > In
> >> >> > > > memory structure is constructed from a series of parquet
files,
> >> >> > somehow,
> >> >> > > > then Spark pulls it in, does some computations, exposes
a data
> >> set,
> >> >> > > > etc...).
> >> >> > > >
> >> >> > > > If you are leaving the allocation of the memory to the
> >> applications
> >> >> and
> >> >> > > > underneath the memory is being allocated using direct
> >> bytebuffers, I
> >> >> > > can't
> >> >> > > > see exactly why the problem is fundamentally hard- especially
> if
> >> the
> >> >> > > > applications themselves are worried about exposing their
own
> >> memory
> >> >> > > spaces.
> >> >> > > >
> >> >> > > >
> >> >> > > > On Wed, Feb 24, 2016 at 2:17 PM, Andrew Brust <
> >> >> > > > andrew.brust@bluebadgeinsights.com> wrote:
> >> >> > > >
> >> >> > > > > Hmm...that's not exactly how Jaques described things
to me
> when
> >> he
> >> >> > > > briefed
> >> >> > > > > me on Arrow ahead of the announcement.
> >> >> > > > >
> >> >> > > > > -----Original Message-----
> >> >> > > > > From: Zhe Zhang [mailto:zhz@apache.org]
> >> >> > > > > Sent: Wednesday, February 24, 2016 2:08 PM
> >> >> > > > > To: dev@arrow.apache.org
> >> >> > > > > Subject: Re: Question about mutability
> >> >> > > > >
> >> >> > > > > I don't think one application/process's memory
space will be
> >> made
> >> >> > > > > available to other applications/processes. It's
fundamentally
> >> hard
> >> >> > for
> >> >> > > > > processes to share their address spaces.
> >> >> > > > >
> >> >> > > > > IIUC, with Arrow, when application A shares data
with
> >> application
> >> >> B,
> >> >> > > the
> >> >> > > > > data is still duplicated in the memory spaces of
A and B.
> It's
> >> just
> >> >> > > that
> >> >> > > > > data serialization/deserialization are much faster
with Arrow
> >> >> > (compared
> >> >> > > > > with Protobuf).
> >> >> > > > >
> >> >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet <
> cjnolet@gmail.com
> >> >
> >> >> > > wrote:
> >> >> > > > >
> >> >> > > > > > Forgive me if this question seems ill-informed.
I just
> started
> >> >> > > looking
> >> >> > > > > > at Arrow yesterday. I looked around the github
a tad.
> >> >> > > > > >
> >> >> > > > > > Are you expecting the memory space held by
one application
> to
> >> be
> >> >> > > > > > mutable by that application and made available
to all
> >> >> applications
> >> >> > > > > > trying to read the memory space?
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >> --
> >> >> --
> >> >> Cheers,
> >> >> Leif
> >> >>
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message