hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12222) Document and test BlockLocation for erasure-coded files
Date Wed, 13 Sep 2017 00:53:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163968#comment-16163968
] 

Hudson commented on HDFS-12222:
-------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12856 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12856/])
HDFS-12222. Document and test BlockLocation for erasure-coded files. (wang: rev f4b6267465d139bfdaf75e25761672eaf61d8a11)
* (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java
* (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BlockLocation.java
* (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/AbstractFileSystem.java
* (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystemWithECFile.java
* (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/LocatedFileStatus.java
* (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
* (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/fs/Hdfs.java


> Document and test BlockLocation for erasure-coded files
> -------------------------------------------------------
>
>                 Key: HDFS-12222
>                 URL: https://issues.apache.org/jira/browse/HDFS-12222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: Huafeng Wang
>              Labels: hdfs-ec-3.0-nice-to-have
>             Fix For: 3.0.0-beta1
>
>         Attachments: HDFS-12222.001.patch, HDFS-12222.002.patch, HDFS-12222.003.patch,
HDFS-12222.004.patch, HDFS-12222.005.patch, HDFS-12222.006.patch
>
>
> HDFS applications query block location information to compute splits. One example of
this is FileInputFormat:
> https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346
> You see bits of code like this that calculate offsets as follows:
> {noformat}
>     long bytesInThisBlock = blkLocations[startIndex].getOffset() + 
>                           blkLocations[startIndex].getLength() - offset;
> {noformat}
> EC confuses this since the block locations include parity block locations as well, which
are not part of the logical file length. This messes up the offset calculation and thus topology/caching
information too.
> Applications can figure out what's a parity block by reading the EC policy and then parsing
the schema, but it'd be a lot better if we exposed this more generically in BlockLocation
instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message