hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huafeng Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
Date Thu, 17 Aug 2017 07:31:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130050#comment-16130050
] 

Huafeng Wang commented on HDFS-12222:
-------------------------------------

Hi guys, I just uploaded an initial patch which only sketches the basic idea. 
In the current implementation, the LocatedFileStatus that FIF fetched is transformed from
HdfsLocatedFileStatus if the underlying file system is HDFS. And the BlockLocation is actually
a block group in the erasure coding case. 
In my first patch, I added an ErasureCodedBlockLocation into LocatedFileStatus and this property
will be set if HdfsLocatedFileStatus is erasure coded.

> Add EC information to BlockLocation
> -----------------------------------
>
>                 Key: HDFS-12222
>                 URL: https://issues.apache.org/jira/browse/HDFS-12222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: Huafeng Wang
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-12222.001.patch
>
>
> HDFS applications query block location information to compute splits. One example of
this is FileInputFormat:
> https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346
> You see bits of code like this that calculate offsets as follows:
> {noformat}
>     long bytesInThisBlock = blkLocations[startIndex].getOffset() + 
>                           blkLocations[startIndex].getLength() - offset;
> {noformat}
> EC confuses this since the block locations include parity block locations as well, which
are not part of the logical file length. This messes up the offset calculation and thus topology/caching
information too.
> Applications can figure out what's a parity block by reading the EC policy and then parsing
the schema, but it'd be a lot better if we exposed this more generically in BlockLocation
instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message