arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Arrow based data access
Date Sun, 19 Mar 2017 18:46:43 GMT
hi Julien,

Having standard RPC/REST messaging protocols for systems to implement
sounds like a great idea to me. Some systems might choose to pack
Arrow files or streams into a Protocol Buffer or Thrift message, but
it would be good to have a "native" protocol for the streaming file
format in particular.

I will be happy to provide feedback on a spec for this and to help
soliciting input from other projects which may use the spec.

Thanks,
Wes

On Wed, Mar 15, 2017 at 11:02 PM, Julien Le Dem <julien@dremio.com> wrote:
> We’re working on finalizing a few types and writing the integration tests
> that go with them.
>
> At this point we have a solid foundation in the Arrow project.
>
> As a next step I’m going to look into adding an Arrow RPC/REST interface
> dedicated to data retrieval.
>
> We had several discussions about this and I’m going to formalize a spec and
> ask for review.
>
> This Arrow based data access interface is intended to be used by systems
> that need access to data for processing (SQL engines, processing
> frameworks, …) and implemented by storage layers or really anything that
> can produce data (including processing frameworks return result sets for
> example). That will greatly simplify integration between the many actors in
> each category.
>
> The basic premise is to be able to fetch data in Arrow format while
> benefitting from the no-overhead serialization deserialization and getting
> the data in columnar format.
>
> Some obvious topics that come to mind:
>
> - How do we identify a dataset?
>
> - How do we specify projections?
>
> - What about predicate push downs or in general parameters?
>
> - What underlying protocol to use? HTTP2?
>
> - push vs pull?
>
> - build a reference implementation (Suggestions?)
>
> Potential candidates for using this:
>
> - to consume data or to expose result sets: Drill, Hive, Presto, Impala,
> Spark, RecordService...
> - as a server: Kudu, HBase, Cassandra, …
>
> --
> Julien

Mime
View raw message