hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravuri, Venkata Puneet" <vrav...@ea.com>
Subject Hive 0.13 count(*) query issue for S3 data storage
Date Mon, 25 Aug 2014 01:13:14 GMT
Hello,

I am using Hadoop 2.5 and Hive 0.13 setup.
I have an external partitioned Hive table with files stored in S3 in RCFile format.
When I perform a 'select *', I get the rows correctly but aggregation queries are failing
with the following exception:-

Caused by: java.io.EOFException: Attempted to seek or read past the end of the file
                    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
                    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
                    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:601)
                    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
                    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
                    at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
                    at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
                    at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
                    at org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
                    at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
                    at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
                    at org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
                    at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
                    at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
                    at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
                    at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
                    ... 15 more

The same issue used to happen for Hive 0.12, but disabling column pruning by setting the property
'hive.optimize.cp' to false resolved this issue.
For Hive 0.13 this property was removed (HIVE-4113<https://issues.apache.org/jira/browse/HIVE-4113>).
Is there any configuration that needs to be changed for accessing RCFiles from S3 through
Hive?


Thanks and Regards,
Puneet

Mime
View raw message