hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <jian.fang.subscr...@gmail.com>
Subject Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin
Date Wed, 08 Jan 2014 19:18:04 GMT
Thanks Vinod for your quick response. It is running in non-secure mode.

I still don't get what is the purpose to use job id in UGI. Could you
please explain a bit more?

Thanks,

John


On Wed, Jan 8, 2014 at 10:11 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> It just seems like lazy code. You can see that, later, there is this:
>
> {code}
>
>         for(Token<?> token :
> UserGroupInformation.getCurrentUser().getTokens()) {
>           childUGI.addToken(token);
>         }
>
> {code}
>
> So eventually the JobToken is getting added to the UGI which runs
> task-code.
>
> >  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server
> handler 63 on 9000): No groups available for user job_201401071758_0002
>
> This seems to be a problem. When the task tries to reach the NameNode, it
> should do so as the user, not the job-id. It is not just logging, I'd be
> surprised if jobs pass. Do you have permissions enabled on HDFS?
>
> Oh, or is this in non-secure mode (i.e. without kerberos)?
>
> +Vinod
>
>
> On Jan 7, 2014, at 5:14 PM, Jian Fang <jian.fang.subscribe@gmail.com>
> wrote:
>
> > Hi,
> >
> > I looked at Hadoop 1.X source code and found some logic that I could not
> understand.
> >
> > In the org.apache.hadoop.mapred.Child class, there were two UGIs defined
> as follows.
> >
> >     UserGroupInformation current = UserGroupInformation.getCurrentUser();
> >     current.addToken(jt);
> >
> >     UserGroupInformation taskOwner
> >      =
> UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
> >     taskOwner.addToken(jt);
> >
> > But it is the taskOwner that is actually passed as a UGI to task tracker
> and then to HDFS. The first one was not referenced any where.
> >
> >     final TaskUmbilicalProtocol umbilical =
> >       taskOwner.doAs(new
> PrivilegedExceptionAction<TaskUmbilicalProtocol>() {
> >         @Override
> >         public TaskUmbilicalProtocol run() throws Exception {
> >           return
> (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
> >               TaskUmbilicalProtocol.versionID,
> >               address,
> >               defaultConf);
> >         }
> >     });
> >
> > What puzzled me is that the job id is actually passed in as the user
> name to task tracker. On the Name node side, when it tries to map the
> non-existing user name, i.e., task id, to a group, it always returns empty
> array. As a result, we always see annoying warning messages such as
> >
> >  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server
> handler 63 on 9000): No groups available for user job_201401071758_0002
> >
> > Sometimes, the warning messages were thrown so fast, hundreds or even
> thousands per second for a big cluster, the system performance was degraded
> dramatically.
> >
> > Could someone please explain why this logic was designed in this way?
> Any benefit to use non-existing user for the group mapping? Or is this a
> bug?
> >
> > Thanks in advance,
> >
> > John
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Mime
View raw message