hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <jian.fang.subscr...@gmail.com>
Subject Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin
Date Wed, 08 Jan 2014 21:26:48 GMT
Looked a bit deeper and seems this code was introduced by the following
JIRA.

https://issues.apache.org/jira/browse/MAPREDUCE-1457

There is another related JIRA, i.e.,
https://issues.apache.org/jira/browse/MAPREDUCE-4329.

Perhaps, the warning message is a side effect of JIRA MAPREDUCE-1457 when
the cluster is running in non-secured mode. There should be some code path
that caused the job id was treated as user name in task tracker or job
tracker. Then the job id was passed to HDFS name node. This is definitely a
big problem since the heavy warning logs alone degraded the system
performance in a relatively big cluster.

This behavior is very easy to reproduce by simply running terasort on a
cluster.

Any suggestion to fix this problem?




On Wed, Jan 8, 2014 at 11:18 AM, Jian Fang <jian.fang.subscribe@gmail.com>wrote:

> Thanks Vinod for your quick response. It is running in non-secure mode.
>
> I still don't get what is the purpose to use job id in UGI. Could you
> please explain a bit more?
>
> Thanks,
>
> John
>
>
> On Wed, Jan 8, 2014 at 10:11 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> It just seems like lazy code. You can see that, later, there is this:
>>
>> {code}
>>
>>         for(Token<?> token :
>> UserGroupInformation.getCurrentUser().getTokens()) {
>>           childUGI.addToken(token);
>>         }
>>
>> {code}
>>
>> So eventually the JobToken is getting added to the UGI which runs
>> task-code.
>>
>> >  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server
>> handler 63 on 9000): No groups available for user job_201401071758_0002
>>
>> This seems to be a problem. When the task tries to reach the NameNode, it
>> should do so as the user, not the job-id. It is not just logging, I'd be
>> surprised if jobs pass. Do you have permissions enabled on HDFS?
>>
>> Oh, or is this in non-secure mode (i.e. without kerberos)?
>>
>> +Vinod
>>
>>
>> On Jan 7, 2014, at 5:14 PM, Jian Fang <jian.fang.subscribe@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I looked at Hadoop 1.X source code and found some logic that I could
>> not understand.
>> >
>> > In the org.apache.hadoop.mapred.Child class, there were two UGIs
>> defined as follows.
>> >
>> >     UserGroupInformation current =
>> UserGroupInformation.getCurrentUser();
>> >     current.addToken(jt);
>> >
>> >     UserGroupInformation taskOwner
>> >      =
>> UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
>> >     taskOwner.addToken(jt);
>> >
>> > But it is the taskOwner that is actually passed as a UGI to task
>> tracker and then to HDFS. The first one was not referenced any where.
>> >
>> >     final TaskUmbilicalProtocol umbilical =
>> >       taskOwner.doAs(new
>> PrivilegedExceptionAction<TaskUmbilicalProtocol>() {
>> >         @Override
>> >         public TaskUmbilicalProtocol run() throws Exception {
>> >           return
>> (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
>> >               TaskUmbilicalProtocol.versionID,
>> >               address,
>> >               defaultConf);
>> >         }
>> >     });
>> >
>> > What puzzled me is that the job id is actually passed in as the user
>> name to task tracker. On the Name node side, when it tries to map the
>> non-existing user name, i.e., task id, to a group, it always returns empty
>> array. As a result, we always see annoying warning messages such as
>> >
>> >  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server
>> handler 63 on 9000): No groups available for user job_201401071758_0002
>> >
>> > Sometimes, the warning messages were thrown so fast, hundreds or even
>> thousands per second for a big cluster, the system performance was degraded
>> dramatically.
>> >
>> > Could someone please explain why this logic was designed in this way?
>> Any benefit to use non-existing user for the group mapping? Or is this a
>> bug?
>> >
>> > Thanks in advance,
>> >
>> > John
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Mime
View raw message