Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 25 Sep 2017 03:11:00 +0000 (UTC)
From: "Huafeng Wang (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13104442.1506124479000.189090.1506309060082@Atlassian.JIRA>
In-Reply-To: <JIRA.13104442.1506124479000@Atlassian.JIRA>
References: <JIRA.13104442.1506124479000@Atlassian.JIRA> <JIRA.13104442.1506124479534@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-12534) Provide logical BlockLocations for
 EC files for better split calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 25 Sep 2017 03:11:05 -0000


    [ https://issues.apache.org/jira/browse/HDFS-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178479#comment-16178479 ] 

Huafeng Wang commented on HDFS-12534:
-------------------------------------

Hi [~andrew.wang], I have a question here. 
{quote}
Applications depend on HDFS BlockLocation to understand where the split points are.
{quote}
I think currently the returned logical BlockLocation per block group has all the data block and parity block's locations. Isn't these information enough? What's the difference between splitting a single block group and multiple logical block locations here? 


> Provide logical BlockLocations for EC files for better split calculation
> ------------------------------------------------------------------------
>
>                 Key: HDFS-12534
>                 URL: https://issues.apache.org/jira/browse/HDFS-12534
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0-beta1
>            Reporter: Andrew Wang
>              Labels: hdfs-ec-3.0-must-do
>
> I talked to [~vanzin] and [~alex.behm] some more about split calculation with EC. It turns out HDFS-12222 was resolved prematurely. Applications depend on HDFS BlockLocation to understand where the split points are. The current scheme of returning one BlockLocation per block group loses this information.
> We should change this to provide logical blocks. Divide the file length by the block size and provide suitable BlockLocations to match, with virtual offsets and lengths too.
> I'm not marking this as incompatible, since changing it this way would in fact make it more compatible from the perspective of applications that are scheduling against replicated files. Thus, it'd be good for beta1 if possible, but okay for later too.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org