hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Rewoonenco (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-7151) DFSInputStream method seek works incorrectly on huge HDFS block size
Date Fri, 26 Sep 2014 14:20:33 GMT
Andrew Rewoonenco created HDFS-7151:
---------------------------------------

             Summary: DFSInputStream method seek works incorrectly on huge HDFS block size
                 Key: HDFS-7151
                 URL: https://issues.apache.org/jira/browse/HDFS-7151
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, fuse-dfs, hdfs-client
    Affects Versions: 2.5.1, 2.4.1, 2.5.0, 2.4.0, 2.3.0
         Environment: dfs.block.size > 2Gb
            Reporter: Andrew Rewoonenco
            Priority: Critical


Hadoop incorrectly works with block size more than 2Gb.

The seek method of DFSInputStream class used int (32 bit signed) internal value for seeking
inside current block. This cause seek error when block size is greater 2Gb.

Found when using very large parquet files (10Gb) in Impala on Cloudera cluster with block
size 10Gb.

Here is some log output:
W0924 08:27:15.920017 40026 DFSInputStream.java:1397] BlockReader failed to seek to 4390830898.
Instead, it seeked to 95863602.
W0924 08:27:15.921295 40024 DFSInputStream.java:1397] BlockReader failed to seek to 5597521814.
Instead, it seeked to 1302554518.

BlockReader seek only 32-bit offsets (4390830898-95863602=4Gb as 5597521814-1302554518).

The code fragment producing that bug:
int diff = (int)(targetPos - pos);
      if (diff <= blockReader.available()) {

Similar errors can exist in other parts of the HDFS.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message