hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <brahmareddy.batt...@hotmail.com>
Subject Re: Ensure High Availability of Datanodes in a HDFS cluster
Date Sat, 01 Jul 2017 00:05:33 GMT

1.Yes, those will ensure that file will be written to available nodes .


BlockManager: defaultReplication         = 2

This is the Default block replication which you configured in server (Namenode). The actual
number of replications can be specified when the file is created. The default is used if replication
is not specified in create time.

3. "dfs.replication" is client(in your case confluent kafka) side property.May be,you can
cross check this configuration in kafka.

-Brahma Reddy Battula
From: Nishant Verma <nishant.verma0702@gmail.com>
Sent: Friday, June 30, 2017 7:50 PM
To: common-user@hadoop.apache.org
Subject: Ensure High Availability of Datanodes in a HDFS cluster


I have a two master and three datanode HDFS cluster setup. They are AWS EC2 instances.

I have to test High Availability of Datanodes i.e., if during load run where data is written
on HDFS, a datanode dies then there is no data loss. The two remaning datanodes which are
alive should take care of the data writes.

I have set below properties in hdfs-site.xml. dfs.replication = 2 (because if any one datanode
dies, then there is no issue of not able to meet replication factor)

dfs.client.block.write.replace-datanode-on-failure.policy = ALWAYS
dfs.client.block.write.replace-datanode-on-failure.enable = true
dfs.client.block.write.replace-datanode-on-failure.best-effort = true

My questions are:

1 - Does setting up above properties suffice my Datanode High Availability? Or something else
is needed? 2 - On dfs service startup, I do see below INFO on namenode logs:

2017-06-27 10:51:52,546 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: defaultReplication
        = 2
2017-06-27 10:51:52,546 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplication
            = 512
2017-06-27 10:51:52,546 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: minReplication
            = 1
2017-06-27 10:51:52,546 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplicationStreams
     = 2

But I still see that the files being created on HDFS are with replication factor 3. Why is
that so? This would hurt my High Availability of Datanodes.

-rw-r--r--   3 hadoopuser supergroup     247373 2017-06-29 09:36 /topics/testTopic/year=2017/month=06/day=29/hour=14/testTopic+210+0001557358+0001557452
-rw-r--r--   3 hadoopuser supergroup       1344 2017-06-29 08:33 /topics/testTopic/year=2017/month=06/day=29/hour=14/testTopic+228+0001432839+0001432850
-rw-r--r--   3 hadoopuser supergroup       3472 2017-06-29 09:03 /topics/testTopic/year=2017/month=06/day=29/hour=14/testTopic+228+0001432851+0001432881
-rw-r--r--   3 hadoopuser supergroup       2576 2017-06-29 08:33 /topics/testTopic/year=2017/month=06/day=29/hour=14/testTopic+23+0001236477+0001236499

P.S. - My records are written on HDFS by Confluent Kafka Connect HDFS Sink Connector.



View raw message