hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei liu <liulei...@gmail.com>
Subject Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
Date Fri, 16 Aug 2013 02:34:06 GMT
Hi Jitendra,

I don't use the compression parameter.

My network card is 100M/s, and  I set the "dfs.image.transfer.bandwidthPerSec."
to 50M, so I think Active NameNode  still has 50M bandwidth to be used to
handle RPC request, why the OPS dropped by 50%?


2013/8/15 Jitendra Yadav <jeetuyadav200890@gmail.com>

> Hi,
> Looks like you got some pace, did you also tried with compression
> parameter? I think you will get more optimization with it. Also file
> transfer speed depends on our network bandwidth between PNN/SNN and network
> traffic b/w nodes.What's your network conf?
> Thanks
>
>
> On Wed, Aug 14, 2013 at 11:39 AM, lei liu <liulei412@gmail.com> wrote:
>
>>  I set the "dfs.image.transfer.bandwidthPerSec." to 50M, and the
>> performance is below:
>>
>> 2013-08-14 12:32:33,079 INFO my.EditLogPerformance: totalCount:1342440
>> speed:1111
>> 2013-08-14 12:32:43,082 INFO my.EditLogPerformance: totalCount:1363338
>> speed:1044
>> 2013-08-14 12:32:53,085 INFO my.EditLogPerformance: totalCount:1385526
>> speed:1109
>> *2013-08-14 12:33:03,087 INFO my.EditLogPerformance: totalCount:1396324
>> speed:539*
>> *2013-08-14 12:33:13,090 INFO my.EditLogPerformance: totalCount:1406232
>> speed:495
>> 2013-08-14 12:33:23,093 INFO my.EditLogPerformance: totalCount:1415006
>> speed:438
>> 2013-08-14 12:33:33,096 INFO my.EditLogPerformance: totalCount:1423952
>> speed:447*
>> *2013-08-14 12:33:43,099 INFO my.EditLogPerformance: totalCount:1437256
>> speed:665*
>> 2013-08-14 12:33:53,102 INFO my.EditLogPerformance: totalCount:1458378
>> speed:1056
>> 2013-08-14 12:34:03,106 INFO my.EditLogPerformance: totalCount:1479338
>> speed:1048
>> 2013-08-14 12:34:13,108 INFO my.EditLogPerformance: totalCount:1500400
>> speed:1053
>> 2013-08-14 12:34:23,111 INFO my.EditLogPerformance: totalCount:1521252
>> speed:1042
>> 2013-08-14 12:34:33,114 INFO my.EditLogPerformance: totalCount:1542286
>> speed:1051
>> 2013-08-14 12:34:43,117 INFO my.EditLogPerformance: totalCount:1562956
>> speed:1033
>> 2013-08-14 12:34:53,120 INFO my.EditLogPerformance: totalCount:1583804
>> speed:1042
>> 2013-08-14 12:35:03,123 INFO my.EditLogPerformance: totalCount:1606558
>> speed:1137
>> 2013-08-14 12:35:13,126 INFO my.EditLogPerformance: totalCount:1627980
>> speed:1071
>> 2013-08-14 12:35:23,129 INFO my.EditLogPerformance: totalCount:1650642
>> speed:1133
>> 2013-08-14 12:35:33,132 INFO my.EditLogPerformance: totalCount:1672806
>> speed:1108
>> 2013-08-14 12:35:43,134 INFO my.EditLogPerformance: totalCount:1693940
>> speed:1056
>> 2013-08-14 12:35:53,137 INFO my.EditLogPerformance: totalCount:1715430
>> speed:1074
>> 2013-08-14 12:36:03,140 INFO my.EditLogPerformance: totalCount:1737940
>> speed:1125
>> 2013-08-14 12:36:13,143 INFO my.EditLogPerformance: totalCount:1760094
>> speed:1107
>> 2013-08-14 12:36:23,146 INFO my.EditLogPerformance: totalCount:1781646
>> speed:1077
>> 2013-08-14 12:36:33,149 INFO my.EditLogPerformance: totalCount:1802230
>> speed:1029
>> 2013-08-14 12:36:43,152 INFO my.EditLogPerformance: totalCount:1824132
>> speed:1095
>> 2013-08-14 12:36:53,155 INFO my.EditLogPerformance: totalCount:1846778
>> speed:1132
>> 2013-08-14 12:37:03,158 INFO my.EditLogPerformance: totalCount:1868956
>> speed:1108
>> 2013-08-14 12:37:13,161 INFO my.EditLogPerformance: totalCount:1888556
>> speed:980
>> 2013-08-14 12:37:23,164 INFO my.EditLogPerformance: totalCount:1910512
>> speed:1097
>> 2013-08-14 12:37:33,167 INFO my.EditLogPerformance: totalCount:1932240
>> speed:1086
>> 2013-08-14 12:37:43,170 INFO my.EditLogPerformance: totalCount:1954226
>> speed:1099
>> 2013-08-14 12:37:53,173 INFO my.EditLogPerformance: totalCount:1974706
>> speed:1024
>> 2013-08-14 12:38:03,176 INFO my.EditLogPerformance: totalCount:1993906
>> speed:960
>> 2013-08-14 12:38:13,179 INFO my.EditLogPerformance: totalCount:2014172
>> speed:1013
>> 2013-08-14 12:38:23,182 INFO my.EditLogPerformance: totalCount:2036130
>> speed:1097
>> 2013-08-14 12:38:33,184 INFO my.EditLogPerformance: totalCount:2057848
>> speed:1085
>> 2013-08-14 12:38:43,187 INFO my.EditLogPerformance: totalCount:2078834
>> speed:1049
>> 2013-08-14 12:38:53,190 INFO my.EditLogPerformance: totalCount:2095616
>> speed:839
>> *2013-08-14 12:39:03,193 INFO my.EditLogPerformance: totalCount:2104896
>> speed:464
>> 2013-08-14 12:39:13,196 INFO my.EditLogPerformance: totalCount:2114572
>> speed:483
>> 2013-08-14 12:39:23,199 INFO my.EditLogPerformance: totalCount:2123512
>> speed:447*
>> *2013-08-14 12:39:33,202 INFO my.EditLogPerformance: totalCount:2133604
>> speed:504*
>> 2013-08-14 12:39:43,205 INFO my.EditLogPerformance: totalCount:2149792
>> speed:809
>>
>>
>>
>> The there are below info in Active NameNode:
>> 2013-08-14 12:44:47,301 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
>> to
>> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=655178418&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
>> 2013-08-14 12:48:57,529 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: *Transfer took
>> 250.23s at 10280.59 KB/s*
>> 2013-08-14 12:48:57,530 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
>> fsimage.ckpt_0000000000655178418 size 2634222089 bytes
>>
>>
>> The there are below info in StadnbyNameNode:
>> 2013-08-14 12:43:57,924 INFO
>> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
>> checkpoint because there have been 2421083 txns since the last checkpoint,
>> which exceeds the configured threshold 1000000
>> 2013-08-14 12:43:57,925 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
>> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000655178418
>> using no compression
>> 2013-08-14 12:48:58,044 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
>> 250.75s at 0.00 KB/s
>> 2013-08-14 12:48:58,045 INFO
>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
>> txid 655178418 to namenode at 10.232.98.77:20021
>>
>>
>>
>> *When Active NameNode download the fsimage, the OPS is less than 500, the
>> OPS dropped by 50%.*
>>
>> The "checkpoint.png" attachment is monitoring info in ganglia.
>>
>>
>> I have two questions:
>>
>>    - *first question:*  The Active NameNode's bandwidth is occupied by
>>    50%,  Active NameNode  still has 50M bandwidth to be used to handle
>>    RPC request, why the OPS dropped by 50%?
>>
>>
>>    - *second question:* The fsimage file is 2634222089 bytes, Active
>>    NameNode spend 240s to download the file. Yhe
>>    "dfs.image.transfer.bandwidthPerSec" value is 50M,  I think the download
>>    time should is about 50s.
>>
>>
>> Thansk,
>>
>> LiuLei
>>
>>
>>
>>
>>
>

Mime
View raw message