hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-3663) Federation State and Policy Store (DBMS implementation)
Date Thu, 13 Apr 2017 22:21:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968218#comment-15968218
] 

Carlo Curino edited comment on YARN-3663 at 4/13/17 10:20 PM:
--------------------------------------------------------------

Thanks [~giovanni.fumarola] for the updated patch.

Conf:
 # You set {{YarnConfiguration.DEFAULT_FEDERATION_STATESTORE_SQL_MAXCONNECTIONS}} to 1. Isn't
this too small? What values you commonly use? I would put a conservative but not overly tight
value, otherwise users are forced to learn/modify more params.
 # We should  omit {{yarn.federation.state-store.sql.max-connections} from yarn-default.xml
like you do for all other params in this patch.
 
In {{SQLFederationStateStore}}
  # For most fields except {{CALL_SP_GET_ALL_POLICY_CONFIGURATIONS}} you simply use the plural
to differentiate the getter of one or all items. Be consistent (e.g., remove the ALL here,
or add it everywhere else)
  # Minor: userNameDB --> userName, passwordDB --> password
  # When you throw the exceptions (e.g., subcluster registration), it might be nice in the
message to include the sub-clusterID / ip or any other info one can use to debug.
  # Can you comment on why we are using: {{FederationStateStoreErrorCode}}? They don't seem
to be connected to SQL error codes, and they are not used anywhere else (we normally use named
exception, which are easier to understand/track).
  # at line 277-278: formatting
  # We should try to remove redundance, e.g., you have lots of things that look like this:
  {code}
      try {
      FederationMembershipStateStoreInputValidator
          .validateGetSubClusterInfoRequest(subClusterRequest);
    } catch (FederationStateStoreInvalidInputException e) {
      FederationStateStoreUtils.logAndThrowInvalidInputException(LOG,
          "Unable to obtain the infomation about the SubCluster. "
              + e.getMessage());
    }
  {code}
  They could be factored out to {{FederationMembershipStateStoreInputValidator.validate(subClusterRequest)}}
where the type of input param is used to differentiate the method, and the logAndThrowInvalidInputException
is done on that side. Same goes for {{checkSubClusterInfo}}.
  # Similarly to the above we should try to factor out the very repetitive code to create
connection/statements, set params, run, and throw. I don't have a specific advise on this,
but the code is mostly copy and paste, which we should avoid.
  # Move the {{fromStringToSubClusterState}} to the SubclusterState class (and call it fromString().

  # Why {{getSubCluster}} and {{getSubClusters}} use different mechanics for return values?
(registered params vs ResultSet)? Might be worth to be consistent (probably using ResultSet).
  # Line 540: Is this behavior (overwrite existing) consistent with general YARN? (I think
so, but want to check) 
  # Some of the log are a bit vague {{LOG.debug("Got the information about the specified application
}} say spefically what info where gotten
  # if you use {{LOG.debug}} consider to prefix it with a check if we are in debug mode (save
time/objects creations for the String that are then not used).
  # You have several {{      if (cstmt.getInt(2) != 1) }} ROWCOUNT checks. This mix the no
tuple where changed to multiple tuple where changed. Distinguishing the two cases, might help
debug (do we have duplicates in DB, or the entry was not found). (Not mandatory, just somethign
to consider)
  # {{setPolicyConfiguration}} You are doing {{cstmt.setBytes(3, new byte[policyConf.getParams().remaining()]);}}
which adds an empty byte[] instead of what is coming in input.
  # {{getCurrentVersion}} and {{loadVersion}} throw a NotSupportedException or something of
the sort, a silent return null is easy to confuse people. (I know the full version will be
in V2, let's just have a clear breakage if someone try to use this methods).
  
  
  (.... to be continued ...)

  (.... continued...)
  In {{FederationStateStoreInputValidator}}:
  # Please consider to rename the validate methods (ok to have separate JIRA for this). 
  
  In {{FederationStateStoreUtils}}
  # You log at info and debug level inconsistently for the {{set*}} you added. I would suggest
debug for all.
  
  In {{HSQLDBFederationStateStore}}
  # the empty constructor with super() is redundant, just omit the constructor alltogether.
  # I think the code would be more  readable if all the schema was in a well-formatted hsqldb-federation-schema.sql
(or broken down like the SQLServer one) and the code was reading from file the statements
and executing them. 
  # The use of {{SELECT *}} is kind of dangerous, because it hides field renames/moves and
other schema evolution issues, that might lead to hard to debug, I would use explicit named
fields always
  # I am not sure this is kosher {{SELECT applicationId_IN, homeSubCluster_IN}} line 133.
You are not actually returning the homeSubCluster that is in the DB, just what you sent in
input if the app is there. I believe you want to return the fields applicationId an homeSubCluster
and not the *_IN params.
  # shouldn't {{SP_SUBCLUSTERHEARTBEAT}} update the {{lastHeartBeat}} field? More concerning,
why do the tests miss this? If I am correct, you should fix this and more importantly add
coverage for this in the tests.
  # Please comment on line 136 {{HAVING COUNT(*) = 0 }}, what are you doing there?
  # in {{init}} I don't like the silent catching of exception... why do we have it? Since
this is only used for testing I rather have a loud failure than a silent moving on. 

  
  Store Procedures:
  # Can you create only one or two files called sqlserver-federation-schema.sql and sqlserver-federation-storeproc.sql?
Makes the reading easier, and git will do well handling row-level conflicts in most cases.
  # {{applicationHomeSubcluster}} switch from locale to UTC time consistently with other timestamps.
  # {{policies}} the {{params}} should be much bigger than 512. This might contain complex
info about how each queue/user should be mapped across multiple sub-clusters. I would argue
for a fairly large amount of space (we should have a small number of policies anyway). This
is particularly important if/when we will use things like Hekaton (since schema changes are
majorly painful).
  # {{sp_deleteExpiredApplicationsHomeSubCluster}} the -10 should not be hard-coded. How was
that chosen? What is that expressed in? I would argue this should be an input param, and the
value should come up from somewhere in yarn conf.
  # {{sp_deregisterSubCluster}} why is STATE configurable? Does it takes different values?

  # {{sp_registerSubCluster}} is {{@capability VARCHAR(6000)}} big enough? Same question for
the 256 char for the addresses.
  # (NOT TO FIX NOW) I don't love the fact that if two subcluster are mis-configured to clash
on subclusterID we silently override each other entries... I know this behavior is needed
to support HA, but we should (in V2) improve and detect issues like this. 
  # {{sp_getApplicationsHomeSubCluster, sp_getPoliciesConfigurations, sp_getSubClusters}}
use of {{SELECT *}} should be avoided.
  # {{sp_getPolicyConfiguration, sp_setPolicyConfiguration}} size of {{param}} should be increased.
  # {{sp_getSubCluster}} why using the outparams instead of ResultSet?
 
  
TOP LEVEL CONCERNS:
 # What is our schema-evolution story? Should we introduce views?
 # A concern are SQL injections, for example what happens if in {{SP_DELETEAPPLICATIONHOMESUBCLUSTER}}
someone sends in an applicationId = "(SELECT * FROM applications)" ? Do we remove all apps?
 (maybe bad example, but you see what I mean, possibly this is the FederationStateStoreInputValidator
responsibility, let's double check it is done properly).  
 # In v2 it would be nice to have a MySQL version as well, consider abstracting some of the
SQL stuff, are we architecturally ready to have multiple SQL dialects for store proc etc?
 # Do we have enough test coverage? E.g., the HSQLDB issues on updating heartbeat is concerning.
Please double check the tests cover enough of the possible issues (you saw some in production,
replicate them in the tests).




was (Author: curino):
Thanks [~giovanni.fumarola] for the updated patch.

Conf:
 # You set {{YarnConfiguration.DEFAULT_FEDERATION_STATESTORE_SQL_MAXCONNECTIONS}} to 1. Isn't
this too small? What values you commonly use? I would put a conservative but not overly tight
value, otherwise users are forced to learn/modify more params.
 # We should  omit {{yarn.federation.state-store.sql.max-connections} from yarn-default.xml
like you do for all other params in this patch.
 
In {{SQLFederationStateStore}}
  # For most fields except {{CALL_SP_GET_ALL_POLICY_CONFIGURATIONS}} you simply use the plural
to differentiate the getter of one or all items. Be consistent (e.g., remove the ALL here,
or add it everywhere else)
  # Minor: userNameDB --> userName, passwordDB --> password
  # When you throw the exceptions (e.g., subcluster registration), it might be nice in the
message to include the sub-clusterID / ip or any other info one can use to debug.
  # Can you comment on why we are using: {{FederationStateStoreErrorCode}}? They don't seem
to be connected to SQL error codes, and they are not used anywhere else (we normally use named
exception, which are easier to understand/track).
  # at line 277-278: formatting
  # We should try to remove redundance, e.g., you have lots of things that look like this:
  {code}
      try {
      FederationMembershipStateStoreInputValidator
          .validateGetSubClusterInfoRequest(subClusterRequest);
    } catch (FederationStateStoreInvalidInputException e) {
      FederationStateStoreUtils.logAndThrowInvalidInputException(LOG,
          "Unable to obtain the infomation about the SubCluster. "
              + e.getMessage());
    }
  {code}
  They could be factored out to {{FederationMembershipStateStoreInputValidator.validate(subClusterRequest)}}
where the type of input param is used to differentiate the method, and the logAndThrowInvalidInputException
is done on that side. Same goes for {{checkSubClusterInfo}}.
  # Similarly to the above we should try to factor out the very repetitive code to create
connection/statements, set params, run, and throw. I don't have a specific advise on this,
but the code is mostly copy and paste, which we should avoid.
  # Move the {{fromStringToSubClusterState}} to the SubclusterState class (and call it fromString().

  # Why {{getSubCluster}} and {{getSubClusters}} use different mechanics for return values?
(registered params vs ResultSet)? Might be worth to be consistent (probably using ResultSet).
  # Line 540: Is this behavior (overwrite existing) consistent with general YARN? (I think
so, but want to check) 
  # Some of the log are a bit vague {{LOG.debug("Got the information about the specified application
}} say spefically what info where gotten
  # if you use {{LOG.debug}} consider to prefix it with a check if we are in debug mode (save
time/objects creations for the String that are then not used).
  # You have several {{      if (cstmt.getInt(2) != 1) }} ROWCOUNT checks. This mix the no
tuple where changed to multiple tuple where changed. Distinguishing the two cases, might help
debug (do we have duplicates in DB, or the entry was not found). (Not mandatory, just somethign
to consider)
  # {{setPolicyConfiguration}} You are doing {{cstmt.setBytes(3, new byte[policyConf.getParams().remaining()]);}}
which adds an empty byte[] instead of what is coming in input.
  # {{getCurrentVersion}} and {{loadVersion}} throw a NotSupportedException or something of
the sort, a silent return null is easy to confuse people. (I know the full version will be
in V2, let's just have a clear breakage if someone try to use this methods).
  
  
  (.... to be continued ...)

  (.... continued...)
  In {{FederationStateStoreInputValidator}}:
  # Please consider to rename the validate methods (ok to have separate JIRA for this). 
  
  In {{FederationStateStoreUtils}}
  # You log at info and debug level inconsistently for the {{set*}} you added. I would suggest
debug for all.
  
  In {{HSQLDBFederationStateStore}}
  # the empty constructor with super() is redundant, just omit the constructor alltogether.
  # I think the code would be more  readable if all the schema was in a well-formatted hsqldb-federation-schema.sql
(or broken down like the SQLServer one) and the code was reading from file the statements
and executing them. 
  # The use of {{SELECT *}} is kind of dangerous, because it hides field renames/moves and
other schema evolution issues, that might lead to hard to debug, I would use explicit named
fields always
  # I am not sure this is kosher {{SELECT applicationId_IN, homeSubCluster_IN}} line 133.
You are not actually returning the homeSubCluster that is in the DB, just what you sent in
input if the app is there. I believe you want to return the fields applicationId an homeSubCluster
and not the *_IN params.
  # shouldn't {{SP_SUBCLUSTERHEARTBEAT}} update the {{lastHeartBeat}} field? More concerning,
why do the tests miss this? If I am correct, you should fix this and more importantly add
coverage for this in the tests.
  # Please comment on line 136 {{HAVING COUNT(*) = 0 }}, what are you doing there?
  # in {{init}} I don't like the silent catching of exception... why do we have it? Since
this is only used for testing I rather have a loud failure than a silent moving on. 

  
  Store Procedures:
  # Can you create only one or two files called sqlserver-federation-schema.sql and sqlserver-federation-storeproc.sql?
Makes the reading easier, and git will do well handling row-level conflicts in most cases.
  # What are the PRINT statement in the store proc give us? Is this part of SQL logs? Isn't
this redundant/expensive?
  # {{policies}} the {{params}} should be much bigger than 512. This might contain complex
info about how each queue/user should be mapped across multiple sub-clusters. I would argue
for a fairly large amount of space (we should have a small number of policies anyway). This
is particularly important if/when we will use things like Hekaton (since schema changes are
majorly painful).
  # {{sp_deleteExpiredApplicationsHomeSubCluster}} the -10 should not be hard-coded. How was
that chosen? What is that expressed in? I would argue this should be an input param, and the
value should come up from somewhere in yarn conf.
  # {{sp_deregisterSubCluster}} why is STATE configurable? Does it takes different values?

  # {{sp_registerSubCluster}} is {{@capability VARCHAR(6000)}} big enough? Same question for
the 256 char for the addresses.
  # (NOT TO FIX NOW) I don't love the fact that if two subcluster are mis-configured to clash
on subclusterID we silently override each other entries... I know this behavior is needed
to support HA, but we should (in V2) improve and detect issues like this. 
  # {{sp_getApplicationsHomeSubCluster, sp_getPoliciesConfigurations, sp_getSubClusters}}
use of {{SELECT *}} should be avoided.
  # {{sp_getPolicyConfiguration, sp_setPolicyConfiguration}} size of {{param}} should be increased.
  # {{sp_getSubCluster}} why using the outparams instead of ResultSet?
 
  
TOP LEVEL CONCERNS:
 # What is our schema-evolution story? Should we introduce views?
 # A concern are SQL injections, for example what happens if in {{SP_DELETEAPPLICATIONHOMESUBCLUSTER}}
someone sends in an applicationId = "(SELECT * FROM applications)" ? Do we remove all apps?
 (maybe bad example, but you see what I mean, possibly this is the FederationStateStoreInputValidator
responsibility, let's double check it is done properly).  
 # In v2 it would be nice to have a MySQL version as well, consider abstracting some of the
SQL stuff, are we architecturally ready to have multiple SQL dialects for store proc etc?
 # Do we have enough test coverage? E.g., the HSQLDB issues on updating heartbeat is concerning.
Please double check the tests cover enough of the possible issues (you saw some in production,
replicate them in the tests).



> Federation State and Policy Store (DBMS implementation)
> -------------------------------------------------------
>
>                 Key: YARN-3663
>                 URL: https://issues.apache.org/jira/browse/YARN-3663
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>    Affects Versions: YARN-2915
>            Reporter: Giovanni Matteo Fumarola
>            Assignee: Giovanni Matteo Fumarola
>         Attachments: YARN-3663-YARN-2915.v1.patch, YARN-3663-YARN-2915.v2.patch
>
>
> This JIRA tracks a SQL-based implementation of the Federation State and Policy Store,
which implements YARN-3662 APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message