hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chackravarthy Esakkimuthu <chaku.mi...@gmail.com>
Subject Re: Guideline on setting Namenode RPC Handler count (client and service)
Date Tue, 03 May 2016 13:08:31 GMT
Thanks Brahma for the reply,

Will look into the issue you mentioned. (yes we are using 2.6.0 (hdp-2.2))

On Tue, May 3, 2016 at 6:04 PM, Brahma Reddy Battula <
brahmareddy.battula@huawei.com> wrote:

> Hope you are using hadoop-2.6 release.
>
>
>
> As you are targeting to amount of time it’s getting processed, your
> proposed configs options ( *ipc.ping.interval* and *split threshold* can
> be changed)  should be fine .  I mean to say, 2nd and 3rd options.
>
>
>
> You can try once, let’s know.
>
>
>
>
>
> Had seen related issue recently , may be you can have look at *HDFS-10301*
> .
>
>
>
>
>
>
>
> --Brahma Reddy Battula
>
>
>
> *From:* Chackravarthy Esakkimuthu [mailto:chaku.mitcs@gmail.com]
> *Sent:* 03 May 2016 18:10
> *To:* Gokul
> *Cc:* user@hadoop.apache.org
> *Subject:* Re: Guideline on setting Namenode RPC Handler count (client
> and service)
>
>
>
> To add more details on why NN startup delayed while setting handler count
> as 600.
>
>
>
> We are seeing many duplicate full block reports (FBR) from most of the
> DN's for long time (around 3 hours since NN startup) even though NN comes
> out of safe mode in 10 or 15 mins. Since NN comes out of safe mode,
> duplicate FBR's are not rejected.
>
>
>
> It's because DN getting timeout (ipc.ping.interval=60s default) on block
> report RPC call before NN completes processing the blockReport RPC call
> (takes around 70-80 secs). Hence DN does not realise that FBR got processed
> and it kept trying to send again. But NN has processed it already and gets
> error only while sending output.
>
>
>
> The reason why NN takes more than 1 min to process FBR :
>
>    - FBR contains array of storageBlockReport. (no of data directories
>    configured is 10)
>    - Name system write lock is acquired on processing each
>    storageBlockReport and hence single handler thread cannot just complete
>    processing FBR completely once it acquires the lock.
>    - There is a lock contention with other 599 handler threads who are
>    also busy in processing FBR from all DN's. Hence acquiring lock gets
>    delayed and then next storageBlockReport gets processed.
>
>
>    - t -> storageBlockReport[0]   --> Handler thread starts FBR
>       processing.
>       - t + 5s -> storageBlockReport[1]
>       - t + 12s ->  storageBlockReport[2]
>       - ...
>       - ...
>       - t + 70s -> storageBlockReport[9]  --> Handler thread completes
>       FBR processing.
>
>
>
> We are looking for some suggestion to resolve this situation of having
> delayed start of NN. (delayed start means even though NN comes out of safe
> mode, because of duplicate FBR, serviceRPC latency remains high and skips
> the heartbeat for more than 1 minute continuously)
>
>
>
> Possible config options are :
>
>    1. Current value for dfs.blockreport.initialDelay is 120s. This can be
>    increased to 10 - 15 mins to avoid block report storm.
>    2. Increase ipc.ping.interval from 60s to 90s or so.
>    3. Decrease dfs.blockreport.split.threashold to 100k (from 1M) so that
>    block reports from DN will be sent for each storageBlock. Hence DN would
>    get the response quickly from NN. But this would delay in sending the
>    heartbeat as each RPC call might consume upto 60 secs timeout. Hence
>    heartbeat might get delayed for 590s (worst case if all rpc calls succeed
>    consuming 59s).
>
> Or can we move the write lock at higher level and take it once, process
> all storageBlockReports and release it. because from logs, we have seen
> that each storageBlockReport processing takes 20ms-100ms and hence single
> FBR would consume 1s. Also since FBR calls are not that frequent, (block
> report once in 6 hours in our cluster / when disk failure happens) Is it ok
> to reduce the lock granularity?
>
>
>
> Please give suggestion on the same. Also correct me if I am wrong.
>
>
>
> Thanks,
>
> Chackra
>
>
>
>
>
> On Mon, May 2, 2016 at 2:12 PM, Gokul <gokulakannan.m@gmail.com> wrote:
>
> *bump*
>
>
>
> On Fri, Apr 29, 2016 at 5:00 PM, Chackravarthy Esakkimuthu <
> chaku.mitcs@gmail.com> wrote:
>
> Hi,
>
>
>
> Is there any recommendation or guideline on setting no of RPC handlers in
> Namenode based on cluster size (no of datanodes)?
>
>
>
> Cluster details :
>
>
>
> No of datanodes - 1200
>
> NN hardware - 74G heap allocated to NN process, 40 core machine
>
> Total blocks - 80M+
>
> Total Files/Directories - 60M+
>
> Total FSObjects - 150M+
>
>
>
> We have isolated service and client RPC by enabling service-rpc.
>
>
>
> Currently dfs.namenode.handler.count=400 and
> dfs.namenode.service.handler.count=200
>
>
>
> Is 200 good fit for this cluster or any change recommended. Please help
> out.
>
>
>
> Thanks in advance!
>
>
>
> (We have tried increasing service handler count to 600 and have seen delay
> in NN startup time and then it looked quite stable. And setting it to 200
> decreases the delay in startup time but it has slightly higher rpcQueueTime
> and rpcAvgProcessingTime comparing to 600 handler count.)
>
>
>
> Thanks,
>
> Chackra
>
>
>
>
>
> --
>
> Thanks and Regards,
> Gokul
>
>
>

Mime
View raw message