hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
Date Tue, 13 Aug 2013 10:36:02 GMT
Perhaps turning on fsimage compression may help. See
documentation of dfs.image.compress at
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml.

You can also try to throttle the bandwidth it uses via
dfs.image.transfer.bandwidthPerSec.

On Tue, Aug 13, 2013 at 3:45 PM, lei liu <liulei412@gmail.com> wrote:
> I write one programm to test NameNode performance. Please see the
> EditLogPerformance.java
>
> I use 60 threads to execute the EditLogPerformance.javacode,  the testing
> result is below content:
> 2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
> 2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
> 2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
> 2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
> 2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
> 2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
> 2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
> 2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
> 2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
> 2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
> 2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
> 2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
> 2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
> 2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
> 2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
> 2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
> 2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
> 2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
> 2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
> 2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
> 2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
> 2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
> 2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
> 2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
> 2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
> 2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
> 2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
> 2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
> 2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
> 2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
> 2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
> 2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
> 2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
> 2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
> 2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
> 2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
> 2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
> 2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
> 2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
> 2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
> 2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
> 2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
> 2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
> (EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194
>
>
>
> The speed is less than ten sometimes. I find when Active NameNode download
> the fsimage file, the  speed is less than ten. So I think download fsimage
> file that affects the performance  of Active NameNode.
>
>
> There are below info in Standby NameNode:
> 2013-08-13 17:48:12,412 INFO
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
> checkpoint because there have been 2558038 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> 2013-08-13 17:48:12,413 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Saving image file
> /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
> using no compression
> 2013-08-13 17:49:19,085 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
> Image file of size 3385425100 saved in 66 seconds.
> 2013-08-13 17:49:19,655 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 241.45s at 0.00 KB/s
> 2013-08-13 17:53:21,107 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
> txid 521186406 to namenode at 10.232.98.77:20021
>
>
> There are below info in Active NameNode:
> 2013-08-13 17:49:19,659 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
> to
> http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
> 240.95s at 13720.96 KB/s
> 2013-08-13 17:53:20,610 INFO
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
> fsimage.ckpt_0000000000521186406 size 3385425100 bytes.
>
>
>
>
>
>
>
>
>
>
>
>
> 2013/8/13 Jitendra Yadav <jeetuyadav200890@gmail.com>
>>
>> Hi,
>>
>> Can you please let me know that how you identified the slowness between
>> primary and standby namnode?
>>
>> Also please share the network connection bandwidth between these two
>> servers.
>>
>> Thanks
>>
>> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <liulei412@gmail.com> wrote:
>>>
>>> The fsimage file size is 1658934155
>>>
>>>
>>> 2013/8/13 Harsh J <harsh@cloudera.com>
>>>>
>>>> How large are your checkpointed fsimage files?
>>>>
>>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <liulei412@gmail.com> wrote:
>>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>>> > Active
>>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>>> > the
>>>> > Active NameNode is slow?
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > LiuLei
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>
>



-- 
Harsh J

Mime
View raw message