Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 666B710612 for ; Wed, 19 Feb 2014 01:55:01 +0000 (UTC) Received: (qmail 9532 invoked by uid 500); 19 Feb 2014 01:54:51 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 9416 invoked by uid 500); 19 Feb 2014 01:54:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 9405 invoked by uid 99); 19 Feb 2014 01:54:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Feb 2014 01:54:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [74.220.201.171] (HELO oproxy17-pub.mail.unifiedlayer.com) (74.220.201.171) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 19 Feb 2014 01:54:45 +0000 Received: (qmail 28008 invoked by uid 0); 19 Feb 2014 01:54:24 -0000 Received: from unknown (HELO box220.bluehost.com) (69.89.27.220) by oproxy17.mail.unifiedlayer.com with SMTP; 19 Feb 2014 01:54:24 -0000 Received: from [24.10.28.210] (port=33746 helo=[192.168.1.110]) by box220.bluehost.com with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.80) (envelope-from ) id 1WFwMh-00013o-Qp for user@hadoop.apache.org; Tue, 18 Feb 2014 18:54:24 -0700 From: Chris Schneider Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/alternative; boundary="Apple-Mail=_FB0F8C0F-C794-4D44-9C6D-08439D4A9F13" Subject: Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin Date: Tue, 18 Feb 2014 17:54:19 -0800 In-Reply-To: To: user@hadoop.apache.org References: <58319D9D-AE00-4DE0-AA48-BF76CEEA67DB@hortonworks.com> Message-Id: X-Mailer: Apple Mail (2.1283) X-Identified-User: {1610:box220.bluehost.com:bixolabs:scaleunlimited.com} {sentby:smtp auth 24.10.28.210 authed with cschneider@scaleunlimited.com} X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_FB0F8C0F-C794-4D44-9C6D-08439D4A9F13 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Hi John, My AWS Elastic MapReduce NameNode is also filling its log file with = messages like the following: 2014-02-18 23:56:52,344 WARN = org.apache.hadoop.security.UserGroupInformation (IPC Server handler 78 = on 9000): No groups available for user job_201402182309_0073 2014-02-18 23:56:52,351 WARN = org.apache.hadoop.security.UserGroupInformation (IPC Server handler 48 = on 9000): No groups available for user job_201402182309_0073 2014-02-18 23:56:52,356 WARN = org.apache.hadoop.security.UserGroupInformation (IPC Server handler 38 = on 9000): No groups available for user job_201402182309_0073 I ran into this same issue in March 2013 and got past it by using an = m1.xlarge master node (instead of my usual m1.large) when (like right = now) I double my slave count (to 32 cc2.8xlarge instances) to re-import = a lot of my input data. Using that m1.xlarge didn't prevent the NameNode = from logging messages like this, but the beefier instance seemed to = weather the load these messages represented better. Unfortunately, even my m1.xlarge master node now seems overwhelmed. The = cluster starts off fine, efficiently mowing through the jobs in my job = flow step for a few hours, but it eventually gets into a mode where the = copy phase of the reduce jobs appear to make no progress at all. At that = point, the NameNode seems to be spending all of its time writing = messages like the ones above. The issue doesn't seem to be related to the NameNode JVM size (I tried = increasing it to 4GB before I realized it never used more than ~400MB), = nor dfs.namenode.handler.count (which I increased from 64 to 96). We're currently trying to work around the problem by hacking = log4j.properties to set the logging level for = org.apache.hadoop.security.UserGroupInformation to ERROR. We might have = to do so for the entire package, as I've also seen the following in the = NameNode logs: 2014-02-19 01:01:24,184 WARN = org.apache.hadoop.security.ShellBasedUnixGroupsMapping (IPC Server = handler 84 on 9000): got exception trying to get groups for user = job_201402182309_0226 org.apache.hadoop.util.Shell$ExitCodeException: id: = job_201402182309_0226: No such user at org.apache.hadoop.util.Shell.runCommand(Shell.java:255) at org.apache.hadoop.util.Shell.run(Shell.java:182) at = org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) at org.apache.hadoop.util.Shell.execCommand(Shell.java:461) at org.apache.hadoop.util.Shell.execCommand(Shell.java:444) at = org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(Shell= BasedUnixGroupsMapping.java:78) at = org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBase= dUnixGroupsMapping.java:53) at org.apache.hadoop.security.Groups.getGroups(Groups.java:79) at = org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInf= ormation.java:1037) at = org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermis= sionChecker.java:50) at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSName= system.java:5218) at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesy= stem.java:5201) at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesyst= em.java:2030) at = org.apache.hadoop.hdfs.server.namenode.NameNode.getFileInfo(NameNode.java:= 850) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:573) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at = org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.= java:1132) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) I would also be very interested in hearing Jakob Homan and Deveraj Das = respond to your analysis of the changes made for MAPREDUCE-1457. Please post again with any further information you're able to glean = about this problem. Thanks, - Chris On Jan 8, 2014, at 1:26 PM, Jian Fang wrote: > Looked a bit deeper and seems this code was introduced by the = following JIRA. >=20 > https://issues.apache.org/jira/browse/MAPREDUCE-1457 >=20 > There is another related JIRA, i.e., = https://issues.apache.org/jira/browse/MAPREDUCE-4329. >=20 > Perhaps, the warning message is a side effect of JIRA MAPREDUCE-1457 = when the cluster is running in non-secured mode. There should be some = code path that caused the job id was treated as user name in task = tracker or job tracker. Then the job id was passed to HDFS name node. = This is definitely a big problem since the heavy warning logs alone = degraded the system performance in a relatively big cluster. >=20 > This behavior is very easy to reproduce by simply running terasort on = a cluster.=20 >=20 > Any suggestion to fix this problem? >=20 >=20 >=20 >=20 > On Wed, Jan 8, 2014 at 11:18 AM, Jian Fang = wrote: > Thanks Vinod for your quick response. It is running in non-secure = mode. >=20 > I still don't get what is the purpose to use job id in UGI. Could you = please explain a bit more? >=20 > Thanks, >=20 > John >=20 >=20 > On Wed, Jan 8, 2014 at 10:11 AM, Vinod Kumar Vavilapalli = wrote: > It just seems like lazy code. You can see that, later, there is this: >=20 > {code} >=20 > for(Token token : = UserGroupInformation.getCurrentUser().getTokens()) { > childUGI.addToken(token); > } >=20 > {code} >=20 > So eventually the JobToken is getting added to the UGI which runs = task-code. >=20 > > WARN org.apache.hadoop.security.UserGroupInformation (IPC Server = handler 63 on 9000): No groups available for user job_201401071758_0002 >=20 > This seems to be a problem. When the task tries to reach the NameNode, = it should do so as the user, not the job-id. It is not just logging, I'd = be surprised if jobs pass. Do you have permissions enabled on HDFS? >=20 > Oh, or is this in non-secure mode (i.e. without kerberos)? >=20 > +Vinod >=20 >=20 > On Jan 7, 2014, at 5:14 PM, Jian Fang = wrote: >=20 > > Hi, > > > > I looked at Hadoop 1.X source code and found some logic that I could = not understand. > > > > In the org.apache.hadoop.mapred.Child class, there were two UGIs = defined as follows. > > > > UserGroupInformation current =3D = UserGroupInformation.getCurrentUser(); > > current.addToken(jt); > > > > UserGroupInformation taskOwner > > =3D = UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString()); > > taskOwner.addToken(jt); > > > > But it is the taskOwner that is actually passed as a UGI to task = tracker and then to HDFS. The first one was not referenced any where. > > > > final TaskUmbilicalProtocol umbilical =3D > > taskOwner.doAs(new = PrivilegedExceptionAction() { > > @Override > > public TaskUmbilicalProtocol run() throws Exception { > > return = (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class, > > TaskUmbilicalProtocol.versionID, > > address, > > defaultConf); > > } > > }); > > > > What puzzled me is that the job id is actually passed in as the user = name to task tracker. On the Name node side, when it tries to map the = non-existing user name, i.e., task id, to a group, it always returns = empty array. As a result, we always see annoying warning messages such = as > > > > WARN org.apache.hadoop.security.UserGroupInformation (IPC Server = handler 63 on 9000): No groups available for user job_201401071758_0002 > > > > Sometimes, the warning messages were thrown so fast, hundreds or = even thousands per second for a big cluster, the system performance was = degraded dramatically. > > > > Could someone please explain why this logic was designed in this = way? Any benefit to use non-existing user for the group mapping? Or is = this a bug? > > > > Thanks in advance, > > > > John >=20 >=20 > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or = entity to > which it is addressed and may contain information that is = confidential, > privileged and exempt from disclosure under applicable law. If the = reader > of this message is not the intended recipient, you are hereby notified = that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender = immediately > and delete it from your system. Thank You. ----------------------------------------- Chris Schneider http://www.scaleunlimited.com custom big data solutions ----------------------------------------- --Apple-Mail=_FB0F8C0F-C794-4D44-9C6D-08439D4A9F13 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 Hi = John,

My AWS Elastic MapReduce NameNode is also = filling its log file with messages like the = following:

2014-02-18 23:56:52,344 WARN = org.apache.hadoop.security.UserGroupInformation (IPC Server handler 78 = on 9000): No groups available for user = job_201402182309_0073
2014-02-18 23:56:52,351 WARN = org.apache.hadoop.security.UserGroupInformation (IPC Server handler 48 = on 9000): No groups available for user = job_201402182309_0073
2014-02-18 23:56:52,356 WARN = org.apache.hadoop.security.UserGroupInformation (IPC Server handler 38 = on 9000): No groups available for user = job_201402182309_0073

I = ran into this same issue in March 2013 and got past it by using an = m1.xlarge master node (instead of my usual m1.large) when (like right = now) I double my slave count (to 32 cc2.8xlarge instances) to re-import = a lot of my input data. Using that m1.xlarge didn't prevent the NameNode = from logging messages like this, but the beefier instance seemed to = weather the load these messages represented = better.

Unfortunately, even my m1.xlarge master = node now seems overwhelmed. The cluster starts off fine, efficiently = mowing through the jobs in my job flow step for a few hours, but it = eventually gets into a mode where the copy phase of the reduce jobs = appear to make no progress at all. At that point, the NameNode seems to = be spending all of its time writing messages like the ones = above.

The issue doesn't seem to be related to = the NameNode JVM size (I tried increasing it to 4GB before I realized it = never used more than ~400MB), nor dfs.namenode.handler.count (which = I increased from 64 to 96).

We're = currently trying to work around the problem by hacking log4j.properties = to set the logging level = for org.apache.hadoop.security.UserGroupInformation to ERROR. We = might have to do so for the entire package, as I've also seen the = following in the NameNode logs:

2014-02-19 = 01:01:24,184 WARN org.apache.hadoop.security.ShellBasedUnixGroupsMapping = (IPC Server handler 84 on 9000): got exception trying to get groups for = user = job_201402182309_0226
org.apache.hadoop.util.Shell$ExitCodeExcep= tion: id: job_201402182309_0226: No such = user

at = org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at = org.apache.hadoop.util.Shell.run(Shell.java:182)
at = org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)<= /div>
= at = org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
at = org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
at = org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(Shell= BasedUnixGroupsMapping.java:78)
at = org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBase= dUnixGroupsMapping.java:53)
at = org.apache.hadoop.security.Groups.getGroups(Groups.java:79)
at = org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInf= ormation.java:1037)
at = org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.<init>(FS= PermissionChecker.java:50)
at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSName= system.java:5218)
at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesy= stem.java:5201)
at = org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesyst= em.java:2030)
at = org.apache.hadoop.hdfs.server.namenode.NameNode.getFileInfo(NameNode.java:= 850)
= at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown = Source)
at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
at = java.lang.reflect.Method.invoke(Method.java:597)
at = org.apache.hadoop.ipc.RPC$Server.call(RPC.java:573)
at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at = java.security.AccessController.doPrivileged(Native = Method)
at = javax.security.auth.Subject.doAs(Subject.java:396)
at = org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.= java:1132)
at = org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

I would also be very interested in hearing Jakob Homan = and Deveraj Das respond to your analysis of the changes made for = MAPREDUCE-1457.

Please post again with any = further information you're able to glean about this = problem.

Thanks,

- = Chris

On Jan 8, 2014, at 1:26 PM, Jian = Fang wrote:

Looked a bit deeper and = seems this code was introduced by the following JIRA.

https://issu= es.apache.org/jira/browse/MAPREDUCE-1457

There is another related JIRA, i.e., https://issu= es.apache.org/jira/browse/MAPREDUCE-4329.

Perhaps, the = warning message is a side effect of JIRA MAPREDUCE-1457 when the cluster = is running in non-secured mode. There should be some code path that = caused the job id was treated as user name in task tracker or job = tracker. Then the job id was passed to HDFS name node. This is = definitely a big problem since the heavy warning logs alone degraded the = system performance in a relatively big cluster.

This behavior is very easy to reproduce by simply running = terasort on a cluster.

Any suggestion to fix this = problem?




On Wed, Jan 8, 2014 at 11:18 AM, Jian Fang <jian.fang.subscribe@gmail.com> = wrote:
Thanks Vinod for your quick response. It = is running in non-secure mode.

I still don't get what is = the purpose to use job id in UGI. Could you please explain a bit = more?

Thanks,

John


On Wed, Jan 8, 2014 at 10:11 AM, Vinod Kumar = Vavilapalli <vinodkv@hortonworks.com> wrote:
It just seems like lazy code. You can see that, later, there is = this:

{code}

        for(Token<?> token : = UserGroupInformation.getCurrentUser().getTokens()) {
          childUGI.addToken(token);
        }

{code}

So eventually the JobToken is getting added to the UGI which runs = task-code.

>  WARN org.apache.hadoop.security.UserGroupInformation (IPC = Server handler 63 on 9000): No groups available for user = job_201401071758_0002

This seems to be a problem. When the task tries to reach the = NameNode, it should do so as the user, not the job-id. It is not just = logging, I'd be surprised if jobs pass. Do you have permissions enabled = on HDFS?

Oh, or is this in non-secure mode (i.e. without kerberos)?

+Vinod


On Jan 7, 2014, at 5:14 PM, Jian Fang <jian.fang.subscribe@gmail.com> wrote:

> Hi,
>
> I looked at Hadoop 1.X source code and found some logic that I = could not understand.
>
> In the org.apache.hadoop.mapred.Child class, there were two UGIs = defined as follows.
>
>     UserGroupInformation current =3D = UserGroupInformation.getCurrentUser();
>     current.addToken(jt);
>
>     UserGroupInformation taskOwner
>      =3D = UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());<= br> >     taskOwner.addToken(jt);
>
> But it is the taskOwner that is actually passed as a UGI to task = tracker and then to HDFS. The first one was not referenced any = where.
>
>     final TaskUmbilicalProtocol umbilical =3D
>       taskOwner.doAs(new = PrivilegedExceptionAction<TaskUmbilicalProtocol>() {
>         @Override
>         public TaskUmbilicalProtocol run() = throws Exception {
>           return = (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
>               = TaskUmbilicalProtocol.versionID,
>               address,
>               defaultConf);
>         }
>     });
>
> What puzzled me is that the job id is actually passed in as the = user name to task tracker. On the Name node side, when it tries to map = the non-existing user name, i.e., task id, to a group, it always returns = empty array. As a result, we always see annoying warning messages such = as
>
>  WARN org.apache.hadoop.security.UserGroupInformation (IPC = Server handler 63 on 9000): No groups available for user = job_201401071758_0002
>
> Sometimes, the warning messages were thrown so fast, hundreds or = even thousands per second for a big cluster, the system performance was = degraded dramatically.
>
> Could someone please explain why this logic was designed in this = way? Any benefit to use non-existing user for the group mapping? Or is = this a bug?
>
> Thanks in advance,
>
> John


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity = to
which it is addressed and may contain information that is = confidential,
privileged and exempt from disclosure under applicable law. If the = reader
of this message is not the intended recipient, you are hereby notified = that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender = immediately
and delete it from your system. Thank You.
=

http://www.scaleunlimited.com<= br>custom big data = solutions
-----------------------------------------

= --Apple-Mail=_FB0F8C0F-C794-4D44-9C6D-08439D4A9F13--