hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fengdong Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4931) Extend the block placement policy interface to utilize the location information of previously stored files
Date Tue, 25 Jun 2013 09:20:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692883#comment-13692883

Fengdong Yu commented on HDFS-4931:

I don't think this is good. if a data only placed a few data nodes, then it's likely more
map tasks run on the same node
> Extend the block placement policy interface to utilize the location information of previously
stored files  
> ------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-4931
>                 URL: https://issues.apache.org/jira/browse/HDFS-4931
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jihoon Son
>         Attachments: HDFS-4931.patch
> Nowadays, I'm implementing a locality preserving block placement policy which stores
files in a directory in the same datanode. That is to say, given a root directory, files under
the root directory are grouped by paths of their parent directories. After that, files of
a group are stored in the same datanode. 
> When a new file is stored at HDFS, the block placement policy choose the target datanode
considering locations of previously stored files. 
> In the current block placement policy interface, there are some problems. The first problem
is that there is no interface to keep the previously stored files when HDFS is restarted.
To restore the location information of all files, this process should be done during the safe
mode of the namenode.
> To solve the first problem, I modified the block placement policy interface and FSNamesystem.
Before leaving the safe mode, every necessary location information is sent to the block placement
> However, there are too much changes of access modifiers from private to public in my
implementation. This may violate the design of the interface. 
> The second problem is occurred when some blocks are moved by the balancer or node failures.
In this case, the block placement policy should recognize the current status, and return a
new datanode to move blocks. However, the current interface does not support it. 
> The attached patch is to solve the first problem, but as mentioned above, it may violate
the design of the interface. 
> Do you have any good ideas?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message