The arrow::RecordBatchReader needs a arrow::dataset::RecordBatchProjector which needs the Schema. It seems that I can't get the schema first and read the streaming parquet by arrow.
In my situation, the parquet file is in the object system like S3. I can get it from the network slice by slice with any filesize, but can't hold the whole file in the memory and disk.
Your reply indicates that the C++ can't read the streaming parquet now, so what should I try next with the arrow or anything else?
Thank you for your work~~ At 2019-11-01 01:46:32, "Wes McKinney" wrote: >You will want to use the GetRecordBatchReader C++ API here > >https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader.h#L152 > >It may not be optimal for your use case. Support for streaming reads >is not yet exposed in Python or other bindings as far as I know. > >There is work happening in the C++ Datasets project to better support >this use case. > >On Wed, Oct 30, 2019 at 9:28 PM annsshadow wrote: >> >> >> hi~ >> I hava a question about reading parquet file. >> The offical example is reading the whole file from the local. >> Now I can't get the whole parquet file in the memory, only can fetch it slice by slice from the network, so how can I use arrow to read the parquet file? >> thank you~