hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@hortonworks.com>
Subject Re: In the constructor of JobInProgress, why is it safe to call FileSyste.closeAllForUGI().
Date Fri, 29 Mar 2013 15:08:50 GMT
Hey Xiao, iirc, the ugi will be different for the second ec2-user's job
submission. So they should not interfere...
 On Mar 22, 2013 8:05 AM, "Xiao Yu" <cutefish.yx@gmail.com> wrote:

> Thanks a lot for the reply.
>
> I have made several changes here and there, but I do not think they are
> quite relevant (the most relevant modification on the control flow of JIP
> construction is at the end of JobInProgress$initTasks() where I add all the
> map tasks of this job into a data structure in JT I maintained by my
> code)....May I describe a case that I think might be a problem and you can
> help me identify if it is a bug or I am not using hadoop correctly.
>
> 1. In my environment, there is only one user, ec2-user, which is the owner
> of JT, the submitter of all the jobs, the owner of HDFS, etc....
>
> 2.
> (1) job1 submitted and its JIP jip1 constructed successfully
> (2) jip1 starts its initTasks() function and inside that function it calls
> generateAndStoreTokens() which calls tokenStorage.writeTokenStorageFile(),
> which opens a JobToken file, jobToken1, write and not close it yet. This
> file is opend with ownership ec2-user
> (3) job2 submitted and inside its finally clause of constructor, it calls
> FileSystem.closeAllForUGI('ec2-user'), inside which it calls
> dfsClient.close()
> (4) in version 1.1.0 and before, dfsClient has a leaseChecker object that
> maintains all the writing files and their outputstream, dfsClient.close()
> calls leaseChecker.close() which pops out the files one by one in a
> synchronized block but close the related outputstream outside the block. So
> in jip2 constructor, FileSystem.closeAllForUGI('ec2-user'), it
> gets*jobToken1
> * and its outputstream, but not close it yet.
> (5) the outputstream from (2) of jip1 close its stream of jobToken1
> (6) the outputstream from (4) of jip2 tries to closes its stream of
> jobToken1 but fails with a null pointer exception.
>
> 3. Even if the leaseChecker object is not used in later version, I still do
> not understand why it is safe to call closeAllForUGI('ec2-user') in job2
> constructor when job1 could write a jobToken file in initTasks() also with
> ownership 'ec2-user'.
>
> Thanks.
>
> Xiao Yu
>
>
> On Fri, Mar 22, 2013 at 7:30 AM, Harsh J <harsh@cloudera.com> wrote:
>
> > The current user (UserGroupInformation.getCurrentUser()) is the user
> > active in the RPC call thats invoking these functions, and not the JT
> > user exactly.
> >
> > However, given that the JIP construction is outside of a synchronized
> > step and can potentially happen in parallel with another JIP request,
> > it is possible that you may have identified a possible bug here.
> >
> > I've not seen this happen though, even at high loads of submits from a
> > single user (where I think this could happen). Can you detail your
> > changes, cause it could be somewhat related to that as well? The UGI
> > compare inside of closeAllForUGI is probably protective enough but
> > it'd still be worth looking into.
> >
> > On Fri, Mar 22, 2013 at 4:12 AM, Xiao Yu <cutefish.yx@gmail.com> wrote:
> > > Hi,
> > >
> > > This might be a naive question, but I am having a difficult time to
> > > understand it. At the end of the constructor of JobInProgress, in the
> > > finally clause, the code calls
> > > FileSystem.closeAllForUGI(UserGroupInformation.getCurrentUser()), but
> why
> > > is it safe.
> > >
> > > My concern is that the current user is the owner of jobtracker, so it
> > will
> > > close all the files the jobtracker is writing, such as a jobtoken file
> > > another jip is currently writing.
> > >
> > > I modified some of the code of hadoop-1.1.0 for my research project and
> > saw
> > > the following error. It could be some bug in my code, but I suspect it
> > is a
> > > combined effect of this closeAllForUGI function and perhaps a race
> > > condition in the DFSClient$LeaseChecker.close().
> > >
> > > Could you help me understand why it is safe to call this
> > > FileSystem.closeAllForUGI function at the end of the JobInProgress
> > > constructor?
> > >
> > > Thank you very much indeed.
> > >
> > > ===
> > >
> > > 2013-03-21 21:04:56,677 INFO org.apache.hadoop.mapred.JobTracker:
> > > Initializing job_201303212104_0005
> > > 2013-03-21 21:04:56,677 INFO org.apache.hadoop.mapred.JobInProgress:
> > > Initializing job_201303212104_0005
> > > 2013-03-21 21:04:56,841 INFO org.apache.hadoop.hdfs.DFSClient:
> > > read.type=dfs block=blk_700146088908855679_2191
> > >
> >
> src=/home/ec2-user/Hadoop/tmp/mapred/staging/ec2-user/.staging/job_201303212104_0006/job.xml
> > > 2013-03-21 21:04:56,897 INFO org.apache.hadoop.hdfs.DFSClient:
> > ===streamer
> > > closed
> >
> here===:org.apache.hadoop.hdfs.DFSClient$DFSOutputStream@239cd5f5dfsclient
> > =
> > > DFSClient[clientName=DFSClient_-536745704, ugi=ec2-user]
> > > leasechecker remove source:
> > > /test/out/intrecord/4/_logs/history/job_201303212104_0005_conf.xml
> > > 2013-03-21 21:04:56,898 INFO org.apache.hadoop.mapred.JobInProgress:
> > > ===generating job token===
> > > 2013-03-21 21:04:56,931 INFO org.apache.hadoop.mapred.JobInProgress:
> > > job_201303212104_0006: nMaps=1 nReduces=0 max=-1
> > > 2013-03-21 21:04:56,932 INFO org.apache.hadoop.mapred.JobInProgress:
> > > ===closeAllForUGI here===:ec2-user jobid= job_201303212104_0006
> > > *2013-03-21 21:04:56,973 INFO org.apache.hadoop.hdfs.DFSClient:
> > ===streamer
> > > closed
> >
> here===:org.apache.hadoop.hdfs.DFSClient$DFSOutputStream@6d56d7c8dfsclient
> > =
> > > DFSClient[clientName=DFSClient_1735333485, ugi=ec2-user]
> > > leasechecker remove source:
> > > /home/ec2-user/Hadoop/tmp/mapred/system/job_201303212104_0005/jobToken*
> > > 2013-03-21 21:04:56,973 INFO org.apache.hadoop.mapred.JobInProgress:
> > > jobToken generated and stored with users keys in
> > > /home/ec2-user/Hadoop/tmp/mapred/system/job_201303212104_0005/jobToken
> > > *2013-03-21 21:04:56,973 INFO org.apache.hadoop.hdfs.DFSClient:
> > ===streamer
> > > null to
> >
> close===:org.apache.hadoop.hdfs.DFSClient$DFSOutputStream@6d56d7c8dfsclient
> > =
> > > DFSClient[clientName=DFSClient_1735333485, ugi=ec2-user]
> > > leasechecker should remove source:
> > > /home/ec2-user/Hadoop/tmp/mapred/system/job_201303212104_0005/jobToken*
> > > 2013-03-21 21:04:56,974 ERROR
> > > org.apache.hadoop.security.UserGroupInformation:
> > PriviledgedActionException
> > > as:ec2-user cause:java.io.IOException: java.lang.NullPointerException
> > > *2013-03-21 21:04:56,976 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 4 on 9101, call submitJob(job_201303212104_0006, hdfs://
> > >
> >
> 10.151.93.195:9100/home/ec2-user/Hadoop/tmp/mapred/staging/ec2-user/.staging/job_201303212104_0006
> > ,
> > > org.apache.hadoop.security.Credentials@6070c38c) from
> > 10.151.93.195:41602:
> > > error: java.io.IOException: java.lang.NullPointerException*
> > > java.io.IOException: java.lang.NullPointerException
> > >     at
> > org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3799)
> > >     at
> > org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3763)
> > >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >     at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >     at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >     at java.lang.reflect.Method.invoke(Method.java:616)
> > >     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
> > >     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
> > >     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
> > >     at java.security.AccessController.doPrivileged(Native Method)
> > >     at javax.security.auth.Subject.doAs(Subject.java:416)
> > >     at
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
> > >     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> > > Caused by: java.lang.NullPointerException
> > >     at
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:4177)
> > >     at
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:4220)
> > >     at
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:4145)
> > >     at
> > >
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1475)
> > >     at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:341)
> > >     at
> > >
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:339)
> > >     at
> > org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1499)
> > >     at
> > org.apache.hadoop.fs.FileSystem.closeAllForUGI(FileSystem.java:289)
> > >     at
> > org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:498)
> > >     at
> > org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3797)
> > >     ... 12 more
> > > 2013-03-21 21:04:56,979 INFO org.apache.hadoop.hdfs.DFSClient:
> > > read.type=dfs block=blk_972265969568196888_2183
> > >
> >
> src=/home/ec2-user/Hadoop/tmp/mapred/staging/ec2-user/.staging/job_201303212104_0005/job.splitmetainfo
> > >
> > >
> > > Xiao Yu
> >
> >
> >
> > --
> > Harsh J
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message