arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <...@cloudera.com>
Subject Re: Getting started guide
Date Sat, 27 Feb 2016 19:11:40 GMT
Note that we have not prioritized building a lot of new software for
Arrow (outside of the basic C++ implementation and the Drill Java
extraction) because there are a number of details that we need to work
out as a group in the coming weeks:

- Lingering physical memory layout questions, see working documents
https://github.com/apache/arrow/tree/master/format
- Metadata / schema details
- IPC / wire protocol

As a project, these aspects of the Arrow specification are much more
important than any lines of code, because they define what it means to
"use Arrow". So getting started with Arrow is less about using a
particular piece of software but rather conforming data structures and
memory sharing to the Arrow specification. I will start a separate
thread shortly about the metadata unless someone beats me to it.

Note: I will have some bandwidth the next month to work on the C++
Arrow + Python Arrow + Parquet toolchain, so I plan to drop a series
of patches to enable Python pandas users to read Parquet files (using
https://github.com/apache/parquet-cpp) via Arrow data structures
(since pandas requires Arrow to be marshalled to NumPy arrays to be
used).

- Wes

On Sat, Feb 27, 2016 at 10:06 AM, Jason Altekruse
<altekrusejason@gmail.com> wrote:
> The java version of the Arrow project is reasonably consumable. The code
> was extracted from the Apache Drill project which has been using this
> columnar representation since its inception.
>
> Steven Phillips is working on finishing the extraction of the necessary
> interfaces from Drill over in his fork of the arrow repository [1], when
> this gets checked in Drill will be completely separated from Arrow and just
> depending on it as any other consumer would. The branch is still work in
> progress but I believe he is getting close to posting a patch for review.
> If you want you could check out the code in the Drill repository right now
> [2], seeing the vector classes requires running the build once because we
> use code generation to create vectors for each data type. After running the
> Drill build the vector classes can be found at
> exec/vector/target/generated-sources.
>
> [1] - https://github.com/StevenMPhillips/arrow
> [2] - https://github.com/apache/drill
>
> On Fri, Feb 26, 2016 at 8:56 PM, Vishnu Viswanath <
> vishnu.viswanath25@gmail.com> wrote:
>
>> Thanks Leif,
>> I am not trying to incorporate Arrow to any production system. I am just
>> trying to learn this new DS.
>> If you have come across any blogs or if you can tell what should be the
>> starting steps in using Arrow, could you please let me know.
>>
>> --
>> Thanks and Regards,
>> Vishnu Viswanath,
>> *www.vishnuviswanath.com <http://www.vishnuviswanath.com/>*
>>
>> On Fri, Feb 26, 2016 at 9:36 PM, Leif Walsh <leif.walsh@gmail.com> wrote:
>>
>> > Arrow doesn't seem to be ready for use yet.  I think it's an aspirational
>> > project.  I'd watch for announcements soon but I wouldn't try to
>> > incorporate today.
>> >
>> > On Fri, Feb 26, 2016 at 2:10 PM Slava B <gslavale@gmail.com> wrote:
>> >
>> > > Agree, also looking for such tutorial
>> > >
>> > > On Fri, Feb 26, 2016 at 11:05 AM, Vishnu Viswanath <
>> > > vishnu.viswanath25@gmail.com> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I just joined this list, and would like to know if there is any
>> > > > documentation on how to get started with Apache Arrow. I am
>> interested
>> > in
>> > > > using arrow along with Spark or Flink.
>> > > >
>> > > > Thanks and Regards,
>> > > > Vishnu Viswanath,
>> > > > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*
>> > > >
>> > >
>> > --
>> > --
>> > Cheers,
>> > Leif
>> >
>>

Mime
View raw message