arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Masayuki Takahashi <masayuki...@gmail.com>
Subject Re: Parquet+Arrow Java
Date Sun, 23 Jul 2017 13:19:40 GMT
Hi,

I try to convert Parquet files to Arrow.
https://gist.github.com/masayuki038/4be6c8538dfd4563a8d5ff743cf375ae

And I have a question.

When converting Parquet to Arrow, is it the right idea to make Arrow's
VectorSchemaRoot for each RowGroup of Parquet?

thanks.

2017-07-21 5:19 GMT+09:00 Wes McKinney <wesmckinn@gmail.com>:
> hi Sven,
>
> There is a placeholder project in apache/parquet-mr
> https://github.com/apache/parquet-mr/tree/master/parquet-arrow.
>
> It appears in the meantime that Dremio has created a vectorized
> Parquet <-> Arrow reader/writer which has just been open sourced under
> ASL 2.0: https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
>
> I am sure they are very busy right now, but it may be worth discussing
> factoring out this Parquet <-> Arrow interface into a library
> component that can be donated to Apache Parquet.
>
> - Wes
>
> On Wed, Jul 19, 2017 at 4:28 PM, Sven Wagner-Boysen
> <sven.wagner-boysen@signavio.com> wrote:
>> Hi,
>>
>> I started looking into the projects Parquet and Arrow. Looks very promising
>> to me.
>>
>> I also came across PyArrow and the Parquet-Arrow integration in Python. Is
>> there something similar available for Java?
>>
>> https://arrow.apache.org/docs/python/parquet.html
>>
>> Thanks
>> Sven



-- 
高橋 真之

Mime
View raw message