hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13044) RBF: Add a safe mode for the Router
Date Thu, 25 Jan 2018 12:01:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339120#comment-16339120
] 

Yiqun Lin edited comment on HDFS-13044 at 1/25/18 12:00 PM:
------------------------------------------------------------

Hi [~elgoiri], there are some initial review comments from me.

*DFSConfigKeys.java*
 1. Can we rename config name of {{DFS_ROUTER_SAFEMODE_CACHE_EXPIRATION_MS}} from {{dfs.federation.router.safemode.cacheexpiration.count}}
to {{dfs.federation.router.safemode.cache-expiration.interval}}? Why not use {{2 or 3 * DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT}}
as the default expiration interval time? Looks these two config are related, this will be
better than just assigning a new value.
 2. It would be better to make these configs support multiple time units suffixes.
 3. Please document these configs in {{hdfs-default.xml}}.

*Router.java*
 1.Following lines need to update.
{code:java}
+    if (this.safemodeService == null) {
+      // Router is running now
+      // TODO
+    }
{code}
2. line125. This comment seems to be {{/** The start time of the Router. */}}. This was something
I am missing caught before.

 *RouterRpcServer.java*
 line425: Why not define and throw a sub-exception of {{SafeModeException}}?
 line426: Missing "is" before {{in safe mode and cannot handle}}.
 line446: Can we rename {{isSafeMode}} to {{isInSafeMode}}?

*RouterSafemodeService.java*
 line33 and 36: Seems {{FederationStateStoreService}} is non-exist.
 line55: Maybe we need to define a new field named {{enterTime}} to calculate elapsed time
during safemoe. And {{enterTime}} will be updated in {{enter}} method every time. We may experience
enter/leave safemode many times in real env.
 line82: {{synchronized}} is not needed since {{MutableGaugeInt#set}} is thread-safe to use.
It uses AtomicInteger type to record value inside.
 line84: Can we print the elapsed time during safemode as well?
 line90: {{(int)}} is not needed, it already did in {{RouterMetrics#setSafeModeTime}}.
 line110 and 117: Can we print the interval time with {{MILLISECONDS}} format? There is a
chance the value loosing accurate. For example, we set the interval time 999ms, it will return
0 when convert to second format.

*TestRouterSafemode.java*
 line43: The javadoc looks confused. There are two duplicate {{the}}. {{SafeModeTimer}} is
non-exist.
 line116: We are just for the testing, no need to sleep so long time. We can greatly decrease
the interval time tested in this class. The same problem in {{testRouterEnterSafemode}}.
 Additional two test cases we may needed:
 1. Specific test for {{dfs.federation.router.safemode.startup.interval}}. That is say, router
won't leaf safemode until over interval time.
 2. The case that RouterRpcServer rejects the reuquests when it's in safemode.

 
{quote}We could also add an API in the Router Admin to set the safe mode by hand.
{quote}
+1 for this proposal. We  can file new JIRA for the tracking.

In addition, we need to document the meaning of router states. At least I see the semantic
of safemode in RBF is different with that in normal HDFS.


was (Author: linyiqun):
Hi [~elgoiri], there are some initial review comments from me.

*DFSConfigKeys.java*
 1. Can we rename config name of {{DFS_ROUTER_SAFEMODE_CACHE_EXPIRATION_MS}} from {{dfs.federation.router.safemode.cacheexpiration.count}}
to {{dfs.federation.router.safemode.cache-expiration.interval}}? Why not use {{2 or 3 * DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT}}
as the default expiration interval time? Looks these two config are related, this will be
better than just assigning a new value.
 2. It would be better to make these configs support multiple time units suffixes.
 3. Please document these configs in {{hdfs-default.xml}}.

*Router.java*
 1.Following lines need to update.
{code:java}
+    if (this.safemodeService == null) {
+      // Router is running now
+      // TODO
+    }
{code}
2. line125. This comment seems to be {{/** The start time of the Router. */}}. This was something
I am missing caught before.

 *RouterRpcServer.java*
 line425: Why not define and throw a sub-exception of {{SafeModeException}}?
 line426: Missing "is" before {{in safe mode and cannot handle}}.
 line446: Can we rename {{isSafeMode}} to {{isInSafeMode}}?

*RouterSafemodeService.java*
 line33 and 36: Seems {{FederationStateStoreService}} is non-exist.
 line55: Maybe we need to define a new field named {{enterTime}} to calculate elapsed time
during safemoe. And {{enterTime}} will be updated in {{enter}} method every time. We may experience
enter/leave safemode many times in real env.
 line82: {{synchronized}} is not needed since {{MutableGaugeInt#set}} is thread-safe to use.
It uses AtomicInteger type to record value inside.
 line84: Can we print the elapsed time during safemode as well?
 line90: {{(int)}} is not needed, it already did in {{RouterMetrics#setSafeModeTime}}.
 line110 and 117: Can we print the interval time with {{MILLISECONDS}} format? There is a
chance the value loosing accurate. For example, we set the interval time 999ms, it will return
0 when convert to second format.

*TestRouterSafemode.java*
 line43: The javadoc looks confused. There are two duplicate {{the}}. {{SafeModeTimer}} is
non-exist.
 line116: We are just for the testing, no need to sleep so long time. We can greatly decrease
the interval time tested in this class. The same problem in {{testRouterEnterSafemode}}.
 Additional two test cases we may needed:
 1. Specific test for {{dfs.federation.router.safemode.startup.interval}}. That is say, router
won't leaf safemode until passing interval time.
 2. The case that RouterRpcServer rejects the reuquests when it's in safemode.

 
{quote}We could also add an API in the Router Admin to set the safe mode by hand.
{quote}
+1 for this proposal. We  can file new JIRA for the tracking.

In addition, we need to document the meaning of router states. At least I see the semantic
of safemode in RBF is different with that in normal HDFS.

> RBF: Add a safe mode for the Router
> -----------------------------------
>
>                 Key: HDFS-13044
>                 URL: https://issues.apache.org/jira/browse/HDFS-13044
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Major
>         Attachments: HDFS-13004.000.patch
>
>
> When a Router cannot communicate with the State Store, it should enter into a safe mode
that disallows certain operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message