hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4931) Extend the block placement policy interface to utilize the location information of previously stored files
Date Tue, 25 Jun 2013 09:59:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692897#comment-13692897
] 

Steve Loughran commented on HDFS-4931:
--------------------------------------

I can see the benefits of this in some applications -though MR jobs aren't necessarily it,
as scattering the blocks gives you better bandwidth. by keeping them all one node, the max
bandwidth is the #of HDDs on that node, minus all other work going on on those disks. If scattered,
the bandwidth is the #of blocks of the file, minus other work going on against the same blocks.
To make things worse -any other code that is trying to access another file on the same machine
is going to fight for exactly the same set of hard disks.

# the failure mode of the cluster will change. You should look at that carefully. 
# you aren't going to handle a full disk very well, as at that point your constraints don't
get satisfied. 
# rebalance and recovery time will increase, as now all the rebalanced blocks are being directed
to a single server, limited by both the HDD and net bandwidth of that device, rather than
the aggregate bandwidth of the cluster. Assuming all three copies of a file's blocks are stored
only on 3 machines, you get hurt at both ends. As that time to recover increases, exposure
to multiple HDD/node failures increases too. 

I think it may be an interesting experiment, but you need to start looking at the impact of
failures, and the performance problems. Overall, though, I'm not convinced it scales well,
either to large files or large clusters -the latter offering the IO and network bandwidth
this policy would fail to exploit, and the highest failure rates. Normally that failure rate
is a background noise, but with this placement policy, it may be more visible.

What may be more useful is revisiting Facebook's work on sub-cluster placement policy, where
all blocks of a file are stored in the same set of racks in a larger cluster. You get more
chance of rack locality for multiple blocks, and when a rack fails, while some files suffer
more, a lot of files suffer less -and recovery bandwidth is restricted to a fraction of the
net, which, on a multi-layered network, may protect the backbone.

Because its experimental and has scale issues, I don't see a rush to commit patches to support
it unless its backed up by the theory and the data justifying this tactic.

                
> Extend the block placement policy interface to utilize the location information of previously
stored files  
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4931
>                 URL: https://issues.apache.org/jira/browse/HDFS-4931
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jihoon Son
>         Attachments: HDFS-4931.patch
>
>
> Nowadays, I'm implementing a locality preserving block placement policy which stores
files in a directory in the same datanode. That is to say, given a root directory, files under
the root directory are grouped by paths of their parent directories. After that, files of
a group are stored in the same datanode. 
> When a new file is stored at HDFS, the block placement policy choose the target datanode
considering locations of previously stored files. 
> In the current block placement policy interface, there are some problems. The first problem
is that there is no interface to keep the previously stored files when HDFS is restarted.
To restore the location information of all files, this process should be done during the safe
mode of the namenode.
> To solve the first problem, I modified the block placement policy interface and FSNamesystem.
Before leaving the safe mode, every necessary location information is sent to the block placement
policy. 
> However, there are too much changes of access modifiers from private to public in my
implementation. This may violate the design of the interface. 
> The second problem is occurred when some blocks are moved by the balancer or node failures.
In this case, the block placement policy should recognize the current status, and return a
new datanode to move blocks. However, the current interface does not support it. 
> The attached patch is to solve the first problem, but as mentioned above, it may violate
the design of the interface. 
> Do you have any good ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message