giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antonio Barbuzzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-859) Yarn-based Giraph user woes
Date Thu, 13 Nov 2014 11:48:34 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209632#comment-14209632
] 

Antonio Barbuzzi commented on GIRAPH-859:
-----------------------------------------

I still have bug 1 (FileNotFoundException). The workaround of launching the application as
yarn user is not an option.
Moreover, the patch works only if HDFS permission checking is disabled, so it is not an option
either.

> Yarn-based Giraph user woes
> ---------------------------
>
>                 Key: GIRAPH-859
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-859
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alexandre Fonseca
>              Labels: yarn
>         Attachments: GIRAPH-859-incomplete.patch
>
>
> After a lengthy debugging session with Stefan Beskow due to the following post in the
mailing list:
> http://mail-archives.apache.org/mod_mbox/giraph-user/201402.mbox/%3C32c1ea4f88ec4fd2bc0815b012c0de48%40MERCMBX25R.na.SAS.com%3E
> I was able to identify several problems that occur if you submit Giraph jobs using the
Yarn framework with the submitting user being different from the Yarn daemons (and HDFS) user.
This is under a scenario with no authentication. The delegation tokens used in a secure environment
might solve this problem.
> h2. First problem
> Since the client user and the AM master user are different, the AM is unable to find
the HDFS files distributed by the client as it'll look at the wrong home directory: /user/<yarn
user>/giraph_yarn_jar_cache instead of /user/<client user>/giraph_yarn_jar_cache:
> {code}
> 14/02/20 18:10:25 INFO yarn.GiraphYarnClient: Made local resource for :/r/sanyo.unx.sas.com/vol/vol410/u41/stbesk/snapshot_from_git/jars/giraph-ex.jar
to hdfs://el01cn01.unx.sas.com:8020/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0034/giraph-ex.jar
> {code}
> {code}
> Exception in thread "pool-3-thread-2" java.lang.IllegalStateException: Could not configure
the containerlaunch context for GiraphYarnTasks.
>         at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>         at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: File does not exist: hdfs://el01cn01.unx.sas.com:8020/user/yarn/giraph_yarn_jar_cache/application_1392713839733_0034/okapi-0.3.2.jar
> {code}
> h2. Second problem
> The AM attempts to rewrite the giraph-conf.xml in the HDFS distributed cache before launching
the task containers. Since the client is the one who put that file there on the first place,
and default permissions are rw-r--r--, the AM will be unable to rewrite the file unless the
Yarn user also happens to be the HDFS superuser. This issue also occurs at the directory level
of the distributed cache folder for the application when it tries to delete or write new files.
> {code}
> Exception in thread "pool-3-thread-1" java.lang.IllegalStateException: Could not configure
the containerlaunch context for GiraphYarnTasks.
>         at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:401)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:532)
>         at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:489)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn,
access=WRITE, inode="/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0044/giraph-conf.xml":stbesk:supergroup:-rw-r--r--
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
>         at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:164)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5430)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5412)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:5374)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2178)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2133)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2086)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:499)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:321)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1429)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
>         at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1904)
>         at org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:257)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:421)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:396)
>         ... 6 more
> {code}
> h2. Third problem
> A temporary giraph-conf.xml file is created in /tmp/giraph-conf.xml on the host of the
Giraph client submitting a job. However, this file is not deleted after creation so if different
users submit giraph jobs in the same hosts, one of them will be unable to as it won't be able
to write to the temporary location.
> {code}
> Exception in thread "pool-2-thread-1" java.lang.IllegalStateException: Could not configure
the containerlaunch context for GiraphYarnTasks.
>         at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>         at org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.FileNotFoundException: /tmp/giraph-conf.xml (Permission denied)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
>         at org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:235)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:411)
>         at org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:386)
>         ... 6 more
> {code}
> And I'm sure there are many more but in the meantime we stopped the debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message