arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakub Okoński (JIRA) <j...@apache.org>
Subject [jira] [Created] (ARROW-5030) read_row_group fails with Nested data conversions not implemented for chunked array outputs
Date Wed, 27 Mar 2019 14:35:00 GMT
Jakub Okoński created ARROW-5030:
------------------------------------

             Summary: read_row_group fails with Nested data conversions not implemented for
chunked array outputs
                 Key: ARROW-5030
                 URL: https://issues.apache.org/jira/browse/ARROW-5030
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 0.12.0
            Reporter: Jakub Okoński


Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once,
I wanted to use `read_row_group` for my solution, but it fails.

 

I think it's due to fields like these:

{{pyarrow.Field<to: list<item: string>>}}

 

But I'm not sure. Is this a duplicate? The issue linked in the code is resolved https://github.com/apache/arrow/blob/fd0b90a7f7e65fde32af04c4746004a1240914cf/cpp/src/parquet/arrow/reader.cc#L915

 

Stacktrace is

 

{{  File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches}}
{{    table = pf.read_row_group(ix, columns=self._columns)}}
{{  File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py",
line 186, in read_row_group}}
{{    use_threads=use_threads)}}
{{  File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group}}
{{  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status}}
{{pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked
array outputs}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message