arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: Arrow to/from Parquet in Java
Date Mon, 13 Feb 2017 23:10:30 GMT
I haven't looked at the write path yet (Arrow -> Parquet).
If you are interested to contribute I would be happy to help.
I'll publish a PR for the read path for flat schema in the coming months.


On Mon, Feb 13, 2017 at 2:35 PM, Nikola Zezelj <NZezelj@seaportglobal.com>
wrote:

> Hi Julien,
>
> Yes, I did notice that you've started playing around with it.  Any idea in
> terms of timing?
> My use case is fairly straightforward. Arrow to Parquet and back with
> simple flat schemas.
>
> Thanks,
> Nikola
>
> -----Original Message-----
> From: Julien Le Dem [mailto:julien@dremio.com]
> Sent: Monday, February 13, 2017 5:22 PM
> To: dev@arrow.apache.org
> Subject: Re: Arrow to/from Parquet in Java
>
> Hi Nikola,
> The Parquet to Arrow reader should live in the Parquet repo here:
> https://github.com/apache/parquet-mr/tree/master/parquet-arrow
> For now it just has schema conversion code between Parquet and Arrow.
> I've been working on a Java Parquet to Arrow reader.
> What is your use case?
> Did you have types of specific schemas in mind (flat/nested)?
>
>
> On Mon, Feb 13, 2017 at 11:11 AM, Nikola Zezelj <NZezelj@seaportglobal.com
> >
> wrote:
>
> > Thanks Wes,
> >
> > Could I potentially do the same from Java (either through JNI or JNA)?
> > Alternatively, could I use hadoop's ParquetWriter to accomplish this
> task?
> > Performance is definitely a concern so I would appreciate any input as
> > to how these two approaches would compare (in case they are feasible).
> >
> > Thanks again,
> > Nikola
> >
> > -----Original Message-----
> > From: Wes McKinney [mailto:wesmckinn@gmail.com]
> > Sent: Sunday, February 12, 2017 2:25 PM
> > To: dev@arrow.apache.org
> > Subject: Re: Arrow to/from Parquet in Java
> >
> > hi Nikola,
> >
> > I believe Julien started working on this, but I'm not sure what stage
> > of development it's in.
> >
> > We've been building the Arrow/Parquet bridge in parquet-cpp, and it's
> > working very well (e.g.
> > http://wesmckinney.com/blog/python-parquet-multithreading/) -- the
> > nested data implementation is not yet completed, though.
> >
> > - Wes
> >
> > On Sat, Feb 11, 2017 at 7:34 PM, Nikola Zezelj
> > <NZezelj@seaportglobal.com>
> > wrote:
> > > Hi,
> > >
> > > I am trying to convert between Arrow and Parquet formats in Java.
> > > How
> > do I go about doing it?
> > > Any help would be greatly appreciated. Thanks!
> > >
> > > --
> > > Nikola Žeželj
> > >
> > >
> > > This message is for the intended recipient(s) only and subject to
> > > terms and conditions available at
> > > www.seaportglobal.com/pages/disclaimer
> > >
> > > Additional important disclosures:
> > > www.seaportglobal.com/pages/disclosures
> >
> > This message is for the intended recipient(s) only and subject to
> > terms and conditions available at
> > www.seaportglobal.com/pages/disclaimer
> >
> > Additional important disclosures:
> > www.seaportglobal.com/pages/disclosures
> >
>
>
>
> --
> Julien
>
> This message is for the intended recipient(s) only and subject to terms
> and conditions available at www.seaportglobal.com/pages/disclaimer
>
> Additional important disclosures: www.seaportglobal.com/pages/disclosures
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message