ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Speidel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-9022) Kerberos config lost + Cluster outage after adding Kafka service or Oozie service (or any service?)
Date Thu, 12 Feb 2015 16:05:11 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318438#comment-14318438
] 

John Speidel commented on AMBARI-9022:
--------------------------------------

This had been several days ago and I have not tested again.
I had also made some small changes to my patch, but it is unlikely that these changes would
have any affect on this.

No, I don't have any logs but can describe the steps that I took.
- Install a 1 node cluster using the following blueprint (with my patch in place)
{   
  "host_groups" : [
    {
      "name" : "host_group_1",
      "components" : [      
        {
          "name" : "NODEMANAGER"
        },
        {
          "name" : "NAMENODE"
        },
        {
          "name" : "HISTORYSERVER"
        },
        {
          "name" : "ZOOKEEPER_SERVER"
        },
        {
          "name" : "SECONDARY_NAMENODE"
        },
        {
          "name" : "RESOURCEMANAGER"
        },  
        {
          "name" : "APP_TIMELINE_SERVER"
        },        
        {
          "name" : "DATANODE"
        },
        {
          "name" : "YARN_CLIENT"
        },
        {
          "name" : "ZOOKEEPER_CLIENT"
        },
        {
          "name" : "MAPREDUCE2_CLIENT"
        }     
      ],
      "cardinality" : "1"
    }
  ],
  "Blueprints" : {
    "stack_name" : "HDP",
    "stack_version" : "2.2"
  }
}

- manually unzip UnlimitedJCEPolicy
- manually install MIT KDC
- Using UI, kerberize the existing cluster
- Using the UI, add the Oozie service

OOZIE_SERVER failed to start and the exception that I had specified earlier was from the log
that is exposed via the UI for the oozie start operation.


> Kerberos config lost + Cluster outage after adding Kafka service or Oozie service (or
any service?)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-9022
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9022
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent, ambari-server, security
>    Affects Versions: 1.7.0
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>            Assignee: John Speidel
>            Priority: Blocker
>
> Adding the Kafka service to an existing kerberized HDP 2.2 cluster resulted in all the
Kerberos fields in core-site.xml getting blank or literal "null" string which prevented all
the HDFS and Yarn instances from restarting. This caused a major outage - lucky this cluster
isn't prod but this is going to bite somebody badly.
> Error observed in NameNode log:
> {code}2015-01-07 09:56:01,958 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(369))
- Clients are to use nameservice1 to access this namenode/service.
> 2015-01-07 09:56:02,055 FATAL namenode.NameNode (NameNode.java:main(1509)) - Failed to
start namenode.
> java.lang.IllegalArgumentException: Invalid rule: null
>         at org.apache.hadoop.security.authentication.util.KerberosName.parseRules(KerberosName.java:331)
>         at org.apache.hadoop.security.authentication.util.KerberosName.setRules(KerberosName.java:397)
>         at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:75)
>         at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:263)
>         at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:583)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
> 2015-01-07 09:56:02,062 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting
with status 1
> 2015-01-07 09:56:02,064 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:{code}
> Fields which ended up being with "null" string literals in the value field in core-site.xml:
{code}hadoop.http.authentication.kerberos.keytab
> hadoop.http.authentication.kerberos.principal
> hadoop.security.auth_to_local{code}
> Fields which ended up being blank ("") for value field in core-site.xml:
> {code}hadoop.http.authentication.cookie.domain
> hadoop.http.authentication.cookie.path
> hadoop.http.authentication.kerberos.name.rules
> hadoop.http.authentication.signature.secret
> hadoop.http.authentication.signature.secret.file
> hadoop.http.authentication.signer.secret.provider
> hadoop.http.authentication.signer.secret.provider.object
> hadoop.http.authentication.token.validity
> hadoop.http.filter.initializers{code}
> Previous revisions showed undefined which was definitely not the case for past months
this was a working fully kerberized cluster.
> Removing the Kafka service via rest API calls and restarting ambari-server didn't make
the config reappear either.
> I had to de-kerberize cluster and re-kerberize the whole cluster in Ambari in order to
get all those 12 configuration settings re-populated.
> A remaining side effect of this bug even after recovering the cluster is that all the
previous config revisions are now ruined due to the many undefined values that would prevent
the cluster from starting and are therefore no longer viable as a backup to revert to for
any reason. There doesn't seem to be much I can workaround that.
> Ironically the kafka brokers started up fine after ruining all the core components since
Kafka has no security itself.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message