hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-7609) startup used too much time to load edits
Date Fri, 03 Apr 2015 15:52:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394599#comment-14394599
] 

Kihwal Lee edited comment on HDFS-7609 at 4/3/15 3:52 PM:
----------------------------------------------------------

We have seen a related case. In a relatively small cluster, a user created a rogue job that
caused a lot of transactions on namenode.  The edit log was rolling on its own by the ANN
before reaching the regular rolling period.  Then the SBN was losing datanodes because it
took incredibly long to replay the large edit segment. We normally see replay speed of about
30-80k txn/sec (this is still considerably slower compared to 0.23 or 2.x before introduction
of RetryCache), but in this case it was down to 2k txns/sec, causing the one huge segment
replaying to take several hours.  

In this case, the slowdown was caused by the fact that the cache was too small. Since the
cache size is 0.03% of the heap by default, the hash table (GSet) had long chains in each
slot during replaying the edit segment. Increasing the cache size would have made it better.
 Since the transaction rate is not always a function of the size of namespace, the default
cache size may not work in many cases.

Also, if the edit rolling period is greater than the cache expiration time (e.g. 10min), it
may make sense to purge the entire cache in more efficient way before replaying the new segment.
We could record the time when finished with a segment replay and check the elapsed time in
the next segment replay.


was (Author: kihwal):
We have seen a related case. In a relatively small cluster, a user created a rogue job that
caused a lot of transactions on namenode.  The edit log was rolling on its own by the ANN
before reaching the regular rolling period.  Then the SBN was losing datanodes because it
took incredibly long to replay the large edit segment. We normally see replay speed of about
30-80k txn/sec (this is still considerably slower compared to 0.23 or 2.x before introduction
of RetryCache), but in this case it was down to 2k txns/sec, causing the one huge segment
replaying to take several hours.  

In this case, the slowdown was because the cache was too small. Since the cache size is 0.03%
of the heap by default, the hash table (GSet) had long chains in each slot during replaying
the edit segment. Increasing the cache size would have made it better.  Since the transaction
rate is not always a function of the size of namespace, the default cache size may not work
in many cases.

Also, if the edit rolling period is greater than the cache expiration time (e.g. 10min), it
may make sense to purge the entire cache in more efficient way before replaying the new segment.
We could record the time when finished with a segment replay and check the elapsed time in
the next segment replay.

> startup used too much time to load edits
> ----------------------------------------
>
>                 Key: HDFS-7609
>                 URL: https://issues.apache.org/jira/browse/HDFS-7609
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.2.0
>            Reporter: Carrey Zhan
>         Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, recovery_do_not_use_retrycache.patch
>
>
> One day my namenode crashed because of two journal node timed out at the same time under
very high load, leaving behind about 100 million transactions in edits log.(I still have no
idea why they were not rolled into fsimage.)
> I tryed to restart namenode, but it showed that almost 20 hours would be needed before
finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover
mode, the loading speed had no different.
> I looked into the stack trace, judged that it is caused by the retry cache. So I set
dfs.namenode.enable.retrycache to false, the restart process finished in half an hour.
> I think the retry cached is useless during startup, at least during recover process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message