hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei liu <liulei...@gmail.com>
Subject Re: when Standby Namenode is doing checkpoint, the Active NameNode is slow.
Date Tue, 13 Aug 2013 10:15:16 GMT
I write one programm to test NameNode performance. Please see the
EditLogPerformance.java

I use 60 threads to execute the EditLogPerformance.javacode,  the testing
result is below content:
2013-08-13 17:43:01,479 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10392810  speed:1055
2013-08-13 17:43:11,482 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407310  speed:725
2013-08-13 17:43:21,484 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407358  speed:2
2013-08-13 17:43:31,487 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407490  speed:6
2013-08-13 17:43:41,490 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10407624  speed:6
2013-08-13 17:43:51,493 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10408690  speed:53
2013-08-13 17:44:01,496 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10422220  speed:676
2013-08-13 17:44:11,499 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10445216  speed:1149
2013-08-13 17:44:21,502 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10465166  speed:997
2013-08-13 17:44:31,505 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10486614  speed:1072
2013-08-13 17:44:41,508 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10506778  speed:1008
2013-08-13 17:44:51,511 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10526660  speed:994
2013-08-13 17:45:01,514 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10548092  speed:1071
2013-08-13 17:45:11,517 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10569892  speed:1090
2013-08-13 17:45:21,520 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10593296  speed:1170
2013-08-13 17:45:31,523 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10614478  speed:1059
2013-08-13 17:45:41,526 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10636006  speed:1076
2013-08-13 17:45:51,529 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10656430  speed:1021
2013-08-13 17:46:01,532 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10677774  speed:1067
2013-08-13 17:46:11,534 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10699096  speed:1066
2013-08-13 17:46:21,537 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10720970  speed:1093
2013-08-13 17:46:31,540 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10741432  speed:1023
2013-08-13 17:46:41,543 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10760854  speed:971
2013-08-13 17:46:51,546 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10781680  speed:1041
2013-08-13 17:47:01,549 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10802302  speed:1031
2013-08-13 17:47:11,552 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10823888  speed:1079
2013-08-13 17:47:21,555 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10845276  speed:1069
2013-08-13 17:47:31,558 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10865470  speed:1009
2013-08-13 17:47:41,561 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10885046  speed:978
2013-08-13 17:47:51,564 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10905606  speed:1028
2013-08-13 17:48:01,567 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10926854  speed:1062
2013-08-13 17:48:11,570 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10946446  speed:979
2013-08-13 17:48:21,573 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10966554  speed:1005
2013-08-13 17:48:31,576 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:10986794  speed:1012
2013-08-13 17:48:41,579 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11007484  speed:1034
2013-08-13 17:48:51,581 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11028400  speed:1045
2013-08-13 17:49:01,584 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11049312  speed:1045
2013-08-13 17:49:11,587 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11070396  speed:1054
2013-08-13 17:49:21,590 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087408  speed:850
2013-08-13 17:49:31,593 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087418  speed:0
2013-08-13 17:49:41,596 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087546  speed:6
2013-08-13 17:49:51,599 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11087716  speed:8
2013-08-13 17:50:01,602 INFO  my.EditLogPerformance
(EditLogPerformance.java:run(37)) - totalCount:11091608  speed:194



The speed is less than ten sometimes. I find when Active NameNode download
the fsimage file, the  speed is less than ten. So I think download fsimage
file that affects the performance  of Active NameNode.


There are below info in Standby NameNode:
2013-08-13 17:48:12,412 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 2558038 txns since the last checkpoint,
which exceeds the configured threshold 1000000
2013-08-13 17:48:12,413 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000521186406
using no compression
2013-08-13 17:49:19,085 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
3385425100 saved in 66 seconds.
2013-08-13 17:49:19,655 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://10.232.98.77:20021/getimage?putimage=1&txid=521186406&port=20021&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
241.45s at 0.00 KB/s
2013-08-13 17:53:21,107 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 521186406 to namenode at 10.232.98.77:20021


There are below info in Active NameNode:
2013-08-13 17:49:19,659 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://dw78.kgb.sqa.cm4:20021/getimage?getimage=1&txid=521186406&storageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
240.95s at 13720.96 KB/s
2013-08-13 17:53:20,610 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
fsimage.ckpt_0000000000521186406 size 3385425100 bytes.












2013/8/13 Jitendra Yadav <jeetuyadav200890@gmail.com>

> Hi,
>
> Can you please let me know that how you identified the slowness between
> primary and standby namnode?
>
> Also please share the network connection bandwidth between these two
> servers.
>
> Thanks
>
> On Tue, Aug 13, 2013 at 11:52 AM, lei liu <liulei412@gmail.com> wrote:
>
>> The fsimage file size is 1658934155
>>
>>
>> 2013/8/13 Harsh J <harsh@cloudera.com>
>>
>>> How large are your checkpointed fsimage files?
>>>
>>> On Mon, Aug 12, 2013 at 3:42 PM, lei liu <liulei412@gmail.com> wrote:
>>> > When Standby Namenode is doing checkpoint,  upload the image file to
>>> Active
>>> > NameNode, the Active NameNode is very slow. What is reason result to
>>> the
>>> > Active NameNode is slow?
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > LiuLei
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Mime
View raw message