hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MARCOS MEDRADO RUBINELLI <marc...@buscapecompany.com>
Subject Re: UNDERSTANDING HADOOP PERFORMANCE
Date Thu, 11 Apr 2013 11:14:23 GMT
dfs.namenode.handler.count and dfs.datanode.handler.count control how many concurrent threads
the server will have to handle incoming requests. The default values should be fine for smaller
clusters, but if you have a lot of simultaneous HDFS operations, you may see performance gains
by increasing these numbers. Just make sure you have the memory to spare and adjust your heap
sizes accordingly.

dfs.heartbeat.interval and dfs.blockreport.intervalMsec will affect performance in larger
clusters. Datanodes send a message to the namenode saying they are still alive every dfs.heartbeat.interval
seconds, and after dfs.namenode.stale.datanode.interval milliseconds without a heartbeat,
the namenode will mark that datanode as stale. Similarly, the datanode will send a list of
all the blocks it has every dfs.blockreport.intervalMsec milliseconds. For a cluster of 30
machines, that means the namenode receives a heartbeat, on average, every 0.1 seconds, and
a block report every 6 minutes, which should be a negligible load and worth the extra reliability.
If your block reports are taking too long, that's a sign that you have too many small files
and should look into archiving or consolidating them somehow. Personally, I ran into trouble
around 1 million blocks/datanode.

dfs.namenode.decommission.interval is only used when removing datanodes from the cluster.
You can safely ignore it.

Regards,
Marcos

On 11-04-2013 07:19, Dibyendu Karmakar wrote:

Hi everyone,
I am testing hadoop performance. I have come accross the following parameters:
1. dfs.replication
2. dfs.block.size
3. dfs.heartbeat.interval   (dafault: 3)
4. dfs.blockreport.intervalMsec   (default: 3600000)
5. dfs.namenode.handler.count   (default: 10)
6. dfs.datanode.handler.count   (default: 3)
7.dfs.replication.interval    (default: 3)
8.dfs.namenode.decomission.interval    (default: 300)

I have successfully tested 1 and 2 parameters. But the rest of the
parameters starting from dfs.heartbeat.interval is confusing me a lot.

On increment of those parameters, will the hadoop perform better? (
considering separately for read and write operation )...
OR, do I have to decrease those parameters to have hadoop perform better?

Anyone please help. If possible please explain
dfs.namenode.hanlder.count and dfs.datanode.handler.count i.e. what
these two parameters do?

Thank you



Mime
View raw message