arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishnu Viswanath <vishnu.viswanat...@gmail.com>
Subject Re: Getting started guide
Date Sat, 27 Feb 2016 23:26:32 GMT
Thank you everyone.

On Sat, Feb 27, 2016 at 2:00 PM, Jason Altekruse <altekrusejason@gmail.com>
wrote:

> Wes it right, I should have qualified my statement about the Drill code. As
> is stated in the Arrow repo initial design docs, the exact memory layout is
> not finalized. That being said, while the format is designed to be used
> in-memory, it doesn't have the same sticking point about backwards
> compatibility of a persistent format. Eventually it is possible that
> someone may use arrow structures for a long-lived in-memory cache, or
> persist arrow vectors to disk, but this would not be an optimal time to
> start such a project, as the format is not fully defined.
>
> A more appropriate statement would have been, in Drill (and soon to be
> fully moved over to the arrow repo) there is an interface that you can use
> to access arrow-like structures, that will be evolving along with the arrow
> standard's ongoing development. If you are willing to work alongside the
> upcoming refactorings you could start integrating these interfaces into
> other projects. The in-memory structures they use do not yet represent a
> version of the arrow specification, as we have not yet finished discussing
> several parts of the specification, as summarized nicely by Wes, but they
> will be updated throughout the upcoming discussions.
>
> On Sat, Feb 27, 2016 at 11:11 AM, Wes McKinney <wes@cloudera.com> wrote:
>
> > Note that we have not prioritized building a lot of new software for
> > Arrow (outside of the basic C++ implementation and the Drill Java
> > extraction) because there are a number of details that we need to work
> > out as a group in the coming weeks:
> >
> > - Lingering physical memory layout questions, see working documents
> > https://github.com/apache/arrow/tree/master/format
> > - Metadata / schema details
> > - IPC / wire protocol
> >
> > As a project, these aspects of the Arrow specification are much more
> > important than any lines of code, because they define what it means to
> > "use Arrow". So getting started with Arrow is less about using a
> > particular piece of software but rather conforming data structures and
> > memory sharing to the Arrow specification. I will start a separate
> > thread shortly about the metadata unless someone beats me to it.
> >
> > Note: I will have some bandwidth the next month to work on the C++
> > Arrow + Python Arrow + Parquet toolchain, so I plan to drop a series
> > of patches to enable Python pandas users to read Parquet files (using
> > https://github.com/apache/parquet-cpp) via Arrow data structures
> > (since pandas requires Arrow to be marshalled to NumPy arrays to be
> > used).
> >
> > - Wes
> >
> > On Sat, Feb 27, 2016 at 10:06 AM, Jason Altekruse
> > <altekrusejason@gmail.com> wrote:
> > > The java version of the Arrow project is reasonably consumable. The
> code
> > > was extracted from the Apache Drill project which has been using this
> > > columnar representation since its inception.
> > >
> > > Steven Phillips is working on finishing the extraction of the necessary
> > > interfaces from Drill over in his fork of the arrow repository [1],
> when
> > > this gets checked in Drill will be completely separated from Arrow and
> > just
> > > depending on it as any other consumer would. The branch is still work
> in
> > > progress but I believe he is getting close to posting a patch for
> review.
> > > If you want you could check out the code in the Drill repository right
> > now
> > > [2], seeing the vector classes requires running the build once because
> we
> > > use code generation to create vectors for each data type. After running
> > the
> > > Drill build the vector classes can be found at
> > > exec/vector/target/generated-sources.
> > >
> > > [1] - https://github.com/StevenMPhillips/arrow
> > > [2] - https://github.com/apache/drill
> > >
> > > On Fri, Feb 26, 2016 at 8:56 PM, Vishnu Viswanath <
> > > vishnu.viswanath25@gmail.com> wrote:
> > >
> > >> Thanks Leif,
> > >> I am not trying to incorporate Arrow to any production system. I am
> just
> > >> trying to learn this new DS.
> > >> If you have come across any blogs or if you can tell what should be
> the
> > >> starting steps in using Arrow, could you please let me know.
> > >>
> > >> --
> > >> Thanks and Regards,
> > >> Vishnu Viswanath,
> > >> *www.vishnuviswanath.com <http://www.vishnuviswanath.com/>*
> > >>
> > >> On Fri, Feb 26, 2016 at 9:36 PM, Leif Walsh <leif.walsh@gmail.com>
> > wrote:
> > >>
> > >> > Arrow doesn't seem to be ready for use yet.  I think it's an
> > aspirational
> > >> > project.  I'd watch for announcements soon but I wouldn't try to
> > >> > incorporate today.
> > >> >
> > >> > On Fri, Feb 26, 2016 at 2:10 PM Slava B <gslavale@gmail.com>
wrote:
> > >> >
> > >> > > Agree, also looking for such tutorial
> > >> > >
> > >> > > On Fri, Feb 26, 2016 at 11:05 AM, Vishnu Viswanath <
> > >> > > vishnu.viswanath25@gmail.com> wrote:
> > >> > >
> > >> > > > Hi All,
> > >> > > >
> > >> > > > I just joined this list, and would like to know if there
is any
> > >> > > > documentation on how to get started with Apache Arrow. I
am
> > >> interested
> > >> > in
> > >> > > > using arrow along with Spark or Flink.
> > >> > > >
> > >> > > > Thanks and Regards,
> > >> > > > Vishnu Viswanath,
> > >> > > > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*
> > >> > > >
> > >> > >
> > >> > --
> > >> > --
> > >> > Cheers,
> > >> > Leif
> > >> >
> > >>
> >
>



-- 
Thanks and Regards,
Vishnu Viswanath,
*www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message