orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aliaksei Sandryhaila (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ORC-28) Reading a subset of complex-type columns does not select the right columns
Date Wed, 23 Sep 2015 15:16:04 GMT
Aliaksei Sandryhaila created ORC-28:
---------------------------------------

             Summary: Reading a subset of complex-type columns does not select the right columns
                 Key: ORC-28
                 URL: https://issues.apache.org/jira/browse/ORC-28
             Project: Orc
          Issue Type: Bug
            Reporter: Aliaksei Sandryhaila


Selected columns are set through ReaderOptions.include() and correspond to the top-level columns
in an ORC file. ReaderImpl constructor uses this info to determine which physical columns
to read from the file. The current implementation does not do this correctly.

Reproducer:
examples/TestOrcFile.testSeek.orc contains 12 top-level columns:
1: boolean
2-4: int
5-6: double
8: binary
9:string
10: struct<array<struct<int,string>>>
11: array<struct<int,string>>
12: map<string,struct<int,string>>

The physical layout in the file is:
1: boolean
2-4: int
5-6: double
8: binary
9:string
10: struct
11: array
12: struct
13: int
14: string
15: array
16: struct
17: int
18: string
19: map
20: string
21: struct
22: int
23: string

Trying to read column 11, which is array<struct<int,string>>, ReaderImpl actually
reads column 10, because it treats 11 as the index of the physical column, and physical column
11 is a subcolumn of column 10.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message