arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Java Parquet to Arrow Conversion
Date Wed, 19 Aug 2020 16:37:08 GMT
I believe there is code in the iceberg project to do this in pure Java [1].
Right now, there isn't a pure java implementation in the Arrow project.

[1]
https://github.com/apache/iceberg/tree/master/arrow/src/main/java/org/apache/iceberg/arrow/vectorized

On Wed, Aug 19, 2020 at 5:18 AM Chris Nuernberger <chris@techascent.com>
wrote:

> Also, javacpp has prepackaged C++ bindings to arrow for multiple OS's:
>
> http://bytedeco.org/javacpp-presets/arrow/apidocs/
>
> We have had success with javacpp
> <https://github.com/techascent/tech.opencv> in the past and it is much
> better now that their preprocess is based on Clang.
>
> On Tue, Aug 18, 2020 at 4:16 PM Chris Nuernberger <chris@techascent.com>
> wrote:
>
>> Thanks, that is helpful.
>>
>> Chris
>>
>> On Tue, Aug 18, 2020 at 10:24 AM Micah Kornfield <emkornfield@gmail.com>
>> wrote:
>>
>>> Hi Chris,
>>> There is an open PR to support this through C++'s Dataset functionality
>>> [1]. There was also a prior attempt that went stale and I can't find at the
>>> moment.
>>>
>>> IIUC the main missing component at this point before the PR gets merged
>>> is integration to honor "-XX:MaxDirectMemorySize" settings.
>>>
>>> -Micah
>>>
>>> [1] https://github.com/apache/arrow/pull/7030
>>>
>>>
>>>
>>> [1] https://github.com/apache/arrow/pull/7030
>>>
>>> On Tue, Aug 18, 2020 at 6:48 AM Chris Nuernberger <chris@techascent.com>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> We were wondering what the best way to convert a parquet file to an
>>>> arrow file would be via a java pathway.  I notice that the c++ layer
>>>> appears to have this conversion.
>>>>
>>>> The best hint I have see so far is this gist:
>>>> https://gist.github.com/animeshtrivedi/76de64f9dab1453958e1d4f8eca1605f
>>>>
>>>> I also found this jni pathway for ORC files:
>>>> https://github.com/apache/arrow/tree/master/cpp/src/jni
>>>>
>>>> Another thought I had was to use the JNA or JNR and bind to the C glib
>>>> pathway.
>>>>
>>>> Thanks for any help,
>>>>
>>>> Chris
>>>>
>>>

Mime
View raw message