arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From annsshadow <craven...@163.com>
Subject Re:Re: [C++] How can I read streaming parquet file in v0.15.0
Date Fri, 01 Nov 2019 07:56:33 GMT
The arrow::RecordBatchReader needs a arrow::dataset::RecordBatchProjector which needs the Schema.
It seems that I can't get the schema first and read the streaming parquet by arrow.<br/>In
my situation, the parquet file is in the object system like S3. I can get it from the network
slice by slice with any filesize, but can't hold the whole file in the memory and disk.<br/>Your
reply indicates that the C++ can't read the streaming parquet now, so what should I try next
with the arrow or anything else?<br/>Thank you for your work~~
At 2019-11-01 01:46:32, "Wes McKinney" <wesmckinn@gmail.com> wrote:
>You will want to use the GetRecordBatchReader C++ API here
>
>https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader.h#L152
>
>It may not be optimal for your use case. Support for streaming reads
>is not yet exposed in Python or other bindings as far as I know.
>
>There is work happening in the C++ Datasets project to better support
>this use case.
>
>On Wed, Oct 30, 2019 at 9:28 PM annsshadow <cravenboy@163.com> wrote:
>>
>>
>> hi~
>> I hava a question about reading parquet file.
>> The offical example is reading the whole file from the local.
>> Now I can't get the whole parquet file in the memory, only can fetch it slice by
slice from the network, so how can I use arrow to read the parquet file?
>> thank you~
Mime
View raw message