hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikola Vujic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5168) BlockPlacementPolicy does not work for cross node group dependencies
Date Sat, 26 Apr 2014 10:37:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981937#comment-13981937
] 

Nikola Vujic commented on HDFS-5168:
------------------------------------

Hi [~djp],

I verified that org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager.testFormat
is passing successfully. This test does not have anything with this change.

Second, I run org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality
and it failed. I double checked the code and I couldn't find any correlation between the failure
and this change. I checked out trunk branch and run this test again. It FAILED. I don't know
how comes that we have unit test which is failing in the trunk.

Here is the failure trace 
java.lang.AssertionError: expected:<1800> but was:<1814>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)


I will resubmit a patch just in order to trigger the system to run all unit tests again.


> BlockPlacementPolicy does not work for cross node group dependencies
> --------------------------------------------------------------------
>
>                 Key: HDFS-5168
>                 URL: https://issues.apache.org/jira/browse/HDFS-5168
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Nikola Vujic
>            Assignee: Nikola Vujic
>            Priority: Critical
>         Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch
>
>
> Block placement policies do not work for cross rack/node group dependencies. In reality
this is needed when compute servers and storage fall in two independent fault domains, then
both BlockPlacementPolicyDefault and BlockPlacementPolicyWithNodeGroup are not able to provide
proper block placement.
> Let's suppose that we have Hadoop cluster with one rack with two servers, and we run
2 VMs per server. Node group topology for this cluster would be:
>  server1-vm1 -> /d1/r1/n1
>  server1-vm2 -> /d1/r1/n1
>  server2-vm1 -> /d1/r1/n2
>  server2-vm2 -> /d1/r1/n2
> This is working fine as long as server and storage fall into the same fault domain but
if storage is in a different fault domain from the server, we will not be able to handle that.
For example, if storage of server1-vm1 is in the same fault domain as storage of server2-vm1,
then we must not place two replicas on these two nodes although they are in different node
groups.
> Two possible approaches:
> - One approach would be to define cross rack/node group dependencies and to use them
when excluding nodes from the search space. This looks as the cleanest way to fix this as
it requires minor changes in the BlockPlacementPolicy classes.
> - Other approach would be to allow nodes to fall in more than one node group. When we
chose a node to hold a replica we have to exclude from the search space all nodes from the
node groups where the chosen node belongs. This approach may require major changes in the
NetworkTopology.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message