arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <>
Subject Re: Arrow based data access
Date Sun, 19 Mar 2017 18:46:43 GMT
hi Julien,

Having standard RPC/REST messaging protocols for systems to implement
sounds like a great idea to me. Some systems might choose to pack
Arrow files or streams into a Protocol Buffer or Thrift message, but
it would be good to have a "native" protocol for the streaming file
format in particular.

I will be happy to provide feedback on a spec for this and to help
soliciting input from other projects which may use the spec.


On Wed, Mar 15, 2017 at 11:02 PM, Julien Le Dem <> wrote:
> We’re working on finalizing a few types and writing the integration tests
> that go with them.
> At this point we have a solid foundation in the Arrow project.
> As a next step I’m going to look into adding an Arrow RPC/REST interface
> dedicated to data retrieval.
> We had several discussions about this and I’m going to formalize a spec and
> ask for review.
> This Arrow based data access interface is intended to be used by systems
> that need access to data for processing (SQL engines, processing
> frameworks, …) and implemented by storage layers or really anything that
> can produce data (including processing frameworks return result sets for
> example). That will greatly simplify integration between the many actors in
> each category.
> The basic premise is to be able to fetch data in Arrow format while
> benefitting from the no-overhead serialization deserialization and getting
> the data in columnar format.
> Some obvious topics that come to mind:
> - How do we identify a dataset?
> - How do we specify projections?
> - What about predicate push downs or in general parameters?
> - What underlying protocol to use? HTTP2?
> - push vs pull?
> - build a reference implementation (Suggestions?)
> Potential candidates for using this:
> - to consume data or to expose result sets: Drill, Hive, Presto, Impala,
> Spark, RecordService...
> - as a server: Kudu, HBase, Cassandra, …
> --
> Julien

View raw message