arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: Arrow sync in 30min
Date Thu, 01 Sep 2016 17:07:13 GMT
 Notes from the sync:
next meeting same day/time in 2 weeks

*Attendees and their interests:*
Erol:
 - looking for example of IPC from Java to C++
 - C++ to C++
 - Java to Java

Jacques: Dremio

Wes:
 - file format
 - plan for integration testing. (has time in the next 2 to 4 weeks)

Tsuyoshi: Newbie.
 - try to contribute.
 - how to contribute? What’s going on in the project

Uwe:
  - trying to build on linux default python. packaging.

Julien:
  - arrow file
  - make java code match the spec
  - release


*Agenda: *
 - java to C++ IPC
 - file format
 - integration testing
 - How to contribute. state of project.
 - python packaging for linux pip
 - release 0.1

*Topics discussed:*
Java to C++ IPC: ARROW-263
 - communication between 2 processes using shared memory.
 - some info: https://github.com/netty/netty-tcnative
 - Erol:
   - would rather memory maps rather than files.
   - had trouble with the type size not always being the same size in c++
and java
   - goal: need to more large amount across languages (python, matlab,
.net) without going to files. shared memory map would make it easier.
- current thinking: use RPC for communicating memory location.
- Actions:
  - Julien: create a JIRA for prototype.
  - Erol: share prototype of IPC.

RPC:
 - for now looking at GRPC. Possibly use HTTP directly.
 - need a sidecar for RPC.
 - Kudu did their own (krpc)

File format:
 - Julien created a 1st version of the file format. Java impl
 - Wes to do a C++ implementation of the file format.
 - create integrations tests based on that (possibly use jni for java to
drive the c++ lib and check for compatibility)

How to contribute:
 - Pull requests/jira/dev list + sync to discuss.

Tsuyoshi: interested in limitation of java byte array. use arrow for
backend of spark byte array. make spark more scalable.

Wes:
 - Arrow to spark data frames.
 - PySpark integration.
 - Action: Wes open JIRA.

Python packaging:
 - Uwe has python packages to generate parquet files through pandas arrow.
    - use pandas in python to generate arrow. Than arrow to parquet files
and back.
conda-forge. Build portable binaries not tied to a specific linux
distribution
https://conda-forge.github.io/

Release:
 - create blocker jiras to release:
https://issues.apache.org/jira/browse/ARROW-272
   - Action (Uwe). lira to make parquet-cpp optional.
   - create a release of parquet cpp
 - 0.1 is not considered stable yet.







On Thu, Sep 1, 2016 at 9:00 AM, Julien Le Dem <julien@dremio.com> wrote:

> starting now
>
> On Thu, Sep 1, 2016 at 8:32 AM, Julien Le Dem <julien@ledem.net> wrote:
>
>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>
>> Julien
>
>
>
>
> --
> Julien
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message