arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Kornfield (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors
Date Fri, 19 Aug 2016 01:52:20 GMT
Micah Kornfield created ARROW-263:
-------------------------------------

             Summary: Design an initial IPC mechanism for Arrow Vectors
                 Key: ARROW-263
                 URL: https://issues.apache.org/jira/browse/ARROW-263
             Project: Apache Arrow
          Issue Type: New Feature
            Reporter: Micah Kornfield
            Assignee: Micah Kornfield


Prior discussion on this topic [1].

Use-cases:
1.  User defined function (UDF) execution:  One process wants to execute a user defined function
written in another language (e.g. Java executing a function defined in python, this involves
creating Arrow Arrays in java, sending them to python and receiving a new set of Arrow Arrays
produced in python back in the java process).
2.  If a storage system and a query engine are running on the same host we might want use
IPC instead of RPC (e.g. Apache Drill querying Apache Kudu)

Assumptions:
1.  IPC mechanism should be useable from the core set of supported languages (Java, Python,
C) on POSIX and ideally windows systems.  Ideally, we would not need to add dependencies on
additional libraries outside of each languages outside of this document.
We want leverage shared memory for Arrays to avoid doubling RAM requirements by duplicating
the same Array in different memory locations.  
2. Under some circumstances shared memory might be more efficient than FIFOs or sockets (in
other scenarios they won’t see thread below).
3. Security is not a concern for V1, we assume all processes running are “trusted”.

Requirements:
1.Resource management: 
    a.  Both processes need a way of allocating memory for Arrow Arrays so that data can be
passed from one process to another.
    b. There must be a mechanism to cleanup unused Arrow Arrays to limit resource usage but
avoid race conditions when processing arrays
2.  Schema negotiation - before sending data, both processes need to agree on schema each
one will produce.

Out of scope requirements:
1.  IPC channel metadata discovery is out of scope of this document.  Discovery can be provided
by passing appropriate command line arguments, configuration files or other mechanisms like
RPC (in which case RPC channel discovery is still an issue).

[1] http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3C8D5F7E3237B3ED47B84CF187BB17B666148E7322@SHSMSX103.ccr.corp.intel.com%3E




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message