hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Created] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]
Date Thu, 05 Mar 2015 00:03:39 GMT
Xuefu Zhang created HIVE-9863:

             Summary: Querying parquet tables fails with IllegalStateException [Spark Branch]
                 Key: HIVE-9863
             Project: Hive
          Issue Type: Sub-task
          Components: Spark
            Reporter: Xuefu Zhang

Not necessarily happens only in spark branch, queries such as select count(*) from table_name
fails with error:
hive> select * from content limit 2;
Failed with exception All the offsets
listed in the split should be found in the file. expected: [4, 4] found: [BlockMetaData{69644,
881917418 [ColumnMetaData{GZIP [guid] BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP
[collection_name] BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type]
BIT_PACKED], 389887}, ColumnMetaData{GZIP [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY,
BIT_PACKED], 397673}, ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED],
422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 460215},
ColumnMetaData{GZIP [content_size] INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP
[source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP [delete_flag] BOOLEAN
 [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP [meta] BINARY  [RLE, PLAIN, BIT_PACKED],
683834}, ColumnMetaData{GZIP [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of:
[4, 129785482, 260224757] in range 0, 134217728
Time taken: 0.253 seconds
I can reproduce the problem with either local or yarn-cluster. It seems happening to MR also.
Thus, I suspect this is an parquet problem.

This message was sent by Atlassian JIRA

View raw message