hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7217) Improve API service usability for updating service spec and state
Date Thu, 26 Oct 2017 20:07:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221127#comment-16221127
] 

Eric Yang commented on YARN-7217:
---------------------------------

[~jianhe] . Thank you for reviewing the patch, and here are the answers:

{quote}
- should solr and fs be a pluggable implementation of a common interface ? Basically, should
it be either fs or solr back-end. Right now it's both there.
{quote}

This JIRA is a transition phase.  Solr is used as alternate storage mechanism to bridge the
gap that current HDFS storage mechanism can not achieve for listing applications for all users.
 Let's leave the storage change to another JIRA.

{quote}
- getServicesList: it assumes solr is enabled, if not, it will throw NPE. I think we should
conditionally check if solr is enabled, if not, throw exception saying only solr backend is
supported for this endpoint.
{quote}

getServicesList never throw NPE.  Ysc is initialized at constructor.  If Solr is disabled,
it will throw SERVICE_UNAVAILABLE http code.  This is verified in testGetServicesList in TestApiServer
test case.

{quote}
- similarly for getServiceSpec endpoint, it will throw NPE because ysc is null, if solr is
not enabled.
{quote}

Same problem as above, ysc is never null because it is initialized in the constructor.  If
ysc is not initialized when Solr is disabled per suggestion, then NPE situation can occur.
 I agree that the coding style can be more consistent on how SOLR is enabled, and revise code
accordingly.

{quote}
- similarly TestYarnNativeServices#testChangeSpec, as discussed, we won't need to restart
the entire service to update the spec ? what's the use case for this ?
{quote}

Per discussion this morning, it is best to keep configuration change and restart operation
as two separate calls.  This allows configuration to be updated and hold off on deployment
until suitable time window becomes available, then restart the service.  This gives system
administrator more fine grind control to persist desired configuration change, then choose
to restart service or choose to add more nodes without restart.

{quote}
- Should it be if solr is enabled, create the solrClient ? if solr is not enabled, there's
no point creating the solrClient
{quote}

Solr enabled flag is persisted in YarnSolrClient object to keep its internal state atomic
instead of tracking the flag in ServiceClient.  I can add if statement to skip initialization
of yarn solr client.  However, it seems redundant to have to deal with NPE in if statements,
if YarnSolrClient skipped initialization.  Hence, I will not make change here.

{quote}
- updateComponent api should also update the spec in solr ?
- the username parameter is not used in findAppEntry API at all, but the deployApp inserts
the username, then why is the username required in the first place ?
- similarly, username is not used in deleteApp, then why do we need to get the username in
caller in the first place
{quote}

I will fix these bugs.

{quote}
All services configs are currently in YarnServiceConf class, I think we can put the new configs
there to not mix with the core YarnConfigurations, until the feature and config namings are
stable, we can merge them back to YarnConfiguration.
{quote}

We should avoid to introduce sub configuration without expose them to upper level.  The chance
of someone else introduce duplicate hierarchy is high, then it becomes painful to merge. 
I recommend to upstream the configuration knobs to upper level to avoid doing the same thing
over and over.  This is difference in philosophy of how to handle changes, since we are already
on a branch, there is no risk to introduce to yarh-common directly.  I will not make a change
here.

{quote}
could you explain below logic ? looks like it tries to look for all entries with "id:appName"
and the while loop continues until the last one is find, and return the last one . Presumbaly
there's only 1 entry, then why is a while loop required? If there are multiple entries, why
returning the last one ?
{quote}

There will only be one match because this is a single entry query.  However, Solr doesn't
have a single entry lookup interface, and I just use common Iterator interface provided by
Solr.  This is the reason that it is in a while loop.  I can change it to if .. else to make
it more readable.

Thanks for the suggestions, I will make the improvements and upload another patch.  Let me
know if there is any doubts in my comments.  Thanks

> Improve API service usability for updating service spec and state
> -----------------------------------------------------------------
>
>                 Key: YARN-7217
>                 URL: https://issues.apache.org/jira/browse/YARN-7217
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, applications
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>         Attachments: YARN-7217.yarn-native-services.001.patch, YARN-7217.yarn-native-services.002.patch,
YARN-7217.yarn-native-services.003.patch, YARN-7217.yarn-native-services.004.patch, YARN-7217.yarn-native-services.005.patch
>
>
> API service for deploy, and manage YARN services have several limitations.
> {{updateService}} API provides multiple functions:
> # Stopping a service.
> # Start a service.
> # Increase or decrease number of containers.  (This was removed in YARN-7323).
> The overloading is buggy depending on how the configuration should be applied.
> h4. Scenario 1
> A user retrieves Service object from getService call, and the Service object contains
state: STARTED.  The user would like to increase number of containers for the deployed service.
 The JSON has been updated to increase container count.  The PUT method does not actually
increase container count.
> h4. Scenario 2
> A user retrieves Service object from getService call, and the Service object contains
state: STOPPED.  The user would like to make a environment configuration change.  The configuration
does not get updated after PUT method.
> This is possible to address by rearranging the logic of START/STOP after configuration
update.  However, there are other potential combinations that can break PUT method.  For example,
user like to make configuration changes, but not yet restart the service until a later time.
> h4. Scenario 3
> There is no API to list all deployed applications by the same user.
> h4. Scenario 4
> Desired state (spec) and current state are represented by the same Service object.  There
is no easy way to identify "state" is desired state to reach or, the current state of the
service.  It would be nice to have ability to retrieve both desired state, and current state
with separated entry points.  By implementing /spec and /state, it can resolve this problem.
> h4. Scenario 5
> List all services deploy by the same user can trigger a directory listing operation on
namenode if hdfs is used as storage for metadata.  When hundred of users use Service UI to
view or deploy applications, this will trigger denial of services attack on namenode.  The
sparse small metadata files also reduce efficiency of Namenode memory usage.  Hence, a cache
layer for storing service metadata can reduce namenode stress.
> h3. Proposed change
> ApiService can separate the PUT method into two PUT methods for configuration changes
vs operation changes.  New API could look like:
> {code}
> @PUT
> /ws/v1/services/[service_name]/spec
> Request Data:
> {
>   "name": "amp",
>   "components": [
>     {
>       "name": "mysql",
>       "number_of_containers": 2,
>       "artifact": {
>         "id": "centos/mysql-57-centos7:latest",
>         "type": "DOCKER"
>       },
>       "run_privileged_container": false,
>       "launch_command": "",
>       "resource": {
>         "cpus": 1,
>         "memory": "2048"
>       },
>       "configuration": {
>         "env": {
>           "MYSQL_USER":"${USER}",
>           "MYSQL_PASSWORD":"password"
>         }
>       }
>      }
>   ],
>   "quicklinks": {
>     "Apache Document Root": "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/",
>     "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/"
>   }
> }
> {code}
> {code}
> @PUT
> /ws/v1/services/[service_name]/state
> Request data:
> {
>   "name": "amp",
>   "components": [
>     {
>       "name": "mysql",
>       "state": "STOPPED"
>      }
>   ]
> }
> {code}
> SOLR can be used to cache Yarnfile to improve lookup performance and reduce stress of
namenode small file problems and high frequency lookup.  SOLR is chosen for caching metadata
because its indexing feature can be used to build full text search for application catalog
as well.
> For service that requires configuration changes to increase or decrease node count. 
The calling sequence is:
> {code}
> # GET /ws/v1/services/{service_name}/spec
> # Change number_of_containers to desired number.
> # PUT /ws/v1/services/{service_name}/spec to update the spec.
> # PUT /ws/v1/services/{service_name}/state to stop existing service.
> # PUT /ws/v1/services/{service_name}/state to start service.
> {code}
> For components that can increase node count without rewrite configuration:
> {code}
> # GET /ws/v1/services/{service_name}/spec
> # Change number_of_containers to desired number.
> # PUT /ws/v1/services/{service_name}/spec to update the spec.
> # PUT /ws/v1/services/{service_name}/component/{component_name} to change node count.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message