hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkata Puneet Ravuri (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-11270) Seek behavior difference between NativeS3FsInputStream and DFSInputStream
Date Fri, 14 Nov 2014 22:35:34 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Venkata Puneet Ravuri updated HADOOP-11270:
    Attachment: HADOOP-11270.patch

[~stevel@apache.org], HDFS currently supports seek(len(file)), it doesn't throw an error.
Code snippet from seek() in DFSInputStream.java:-

    if (targetPos > getFileLength()) \{
      throw new EOFException("Cannot seek after EOF");

EOF exception is thrown only when the seek position crosses the length of file.
Since our current filesystem spec doesn't mandate that an error should be thrown at seek(len(file)),
I believe this is acceptable.
We would need similar behavior for NativeS3FileSystem so that the clients using FSDataInputStream
will be able to seek irrespective of hdfs/s3n scheme.
I have submitted a patch that will ensure same behavior for NativeS3FileSystem in-case of

Can you please review?

> Seek behavior difference between NativeS3FsInputStream and DFSInputStream
> -------------------------------------------------------------------------
>                 Key: HADOOP-11270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11270
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Venkata Puneet Ravuri
>            Assignee: Venkata Puneet Ravuri
>         Attachments: HADOOP-11270.patch
> There is a difference in behavior while seeking a given file present
> in S3 using NativeS3FileSystem$NativeS3FsInputStream and a file present in HDFS using
> If we seek to the end of the file incase of NativeS3FsInputStream, it fails with exception
"java.io.EOFException: Attempted to seek or read past the end of the file". That is because
a getObject request is issued on the S3 object with range start as value of length of file.
> This is the complete exception stack:-
> Caused by: java.io.EOFException: Attempted to seek or read past the end of the file
> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
> at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
> at org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
> at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
> at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
> at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
> at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
> ... 15 more

This message was sent by Atlassian JIRA

View raw message