hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xeonmailinglist-gmail <xeonmailingl...@gmail.com>
Subject Re: can't submit remote job
Date Tue, 19 May 2015 17:44:32 GMT
I am still looking to find out how to solve this. What it seems is that 
the Namenode can’t write in the Datanode, but I am not sure.
While I debug the code, I have found out that Hadoop is all the time 
decreasing the number of available nodes in [1].

If this is correct, I don’t understand why this happens because my 
system is well configured. So, I still have no answer for this. If 
someone could give me a hint I would be appreciated.

|org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#ln 624

protected DatanodeStorageInfo chooseRandom(...) {
     while(numOfReplicas > 0 && numOfAvailableNodes > 0) {
       DatanodeDescriptor chosenNode =
           (DatanodeDescriptor)clusterMap.chooseRandom(scope);
       if (excludedNodes.add(chosenNode)) { //was not in the excluded list
         if (LOG.isDebugEnabled()) {
           builder.append("\nNode ").append(NodeBase.getPath(chosenNode)).append(" [");
         }
         numOfAvailableNodes--;
         (....)
     }
}
|

On 05/19/2015 05:44 PM, Rajesh Kartha wrote:

> Wondering if you have used the REST API to submit jobs:
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application
>
> There are some issues that I have come across, but it does seem to work.
>
>
> Also the message
> java.io.IOException: File 
> /tmp/hadoop-yarn/staging/xeon/.staging/job_1432045089375_0001/job.split could 
> only be replicated to 0 nodes instead of minReplication (=1).
> There are 1 datanode(s) running and 1 node(s) are excluded in this 
> operation.
>
> One node excluded from the operation (?) do you know why it is excluded ?
>
>
> -Rajesh
>
>
> On Tue, May 19, 2015 at 7:34 AM, xeonmailinglist-gmail 
> <xeonmailinglist@gmail.com <mailto:xeonmailinglist@gmail.com>> wrote:
>
>     This has been a real struggle to launch a remote MapReduce job. I
>     know that there is the Netflix genie to submit the job, but for
>     the purpose of this application (small and personal), I want to
>     code it from scratch.
>
>     I am debugging my code to see what is going on during the
>     submission of a remote job, and now I have the error [1]. This
>     error happens during the submission of the job, more precisely
>     when it is writing in the remote HDFS. I have put the Hadoop code
>     [2] where I get the error. The error [1] happens in the
>     instruction |out.close()| of [2].
>
>     The Namenode and the datanodes are working properly. I have 1
>     Namenode and 1 datanode. The replication factor is set to 1.
>     Despite everything is running ok I get this error. Any hint so
>     that I can see what is going on?
>
>     [1]
>
>     |2015-05-19 10:21:03,147 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to place enough replicas, still in need of 1 to reach 1 (unavailab
>     leStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage
>       types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
>     2015-05-19 10:21:03,147 DEBUG org.apache.hadoop.ipc.Server: Served: addBlock queueTime=
1 procesingTime= 1 exception= IOException
>     2015-05-19 10:21:03,147 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException
as:xeon (auth:SIMPLE) cause:java.io.IOException: File /tmp/hadoop
>     -yarn/staging/xeon/.staging/job_1432045089375_0001/job.split could only be replicated
to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 no
>     de(s) are excluded in this operation.
>     2015-05-19 10:21:03,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on
9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from194.117.18.101:39006
 <http://194.117.18.101:39006>  Call#18 Retry#0
>     java.io.IOException: File /tmp/hadoop-yarn/staging/xeon/.staging/job_1432045089375_0001/job.split
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
>              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
>              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
>              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
>              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
>              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>              at java.security.AccessController.doPrivileged(Native Method)
>              at javax.security.auth.Subject.doAs(Subject.java:422)
>              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>     2015-05-19 10:21:03,148 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 6
on 9000: responding to org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from194.117.18.101:39006
 <http://194.117.18.101:39006>  Call#18 Retry#0
>     |
>
>     [2]
>
>     |  public static <T extends InputSplit> void createSplitFiles(Path jobSubmitDir,
>            Configuration conf, FileSystem fs, T[] splits)
>        throws IOException, InterruptedException {
>          FSDataOutputStream out = createFile(fs,
>              JobSubmissionFiles.getJobSplitFile(jobSubmitDir), conf);
>          SplitMetaInfo[] info = writeNewSplits(conf, splits, out);
>          out.close();
>          writeJobSplitMetaInfo(fs,JobSubmissionFiles.getJobSplitMetaFile(jobSubmitDir),
>              new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION), splitVersion,
>              info);
>        }
>     |
>
>     Thanks,
>
>     On 05/18/2015 03:56 PM, xeonmailinglist-gmail wrote:
>
>>     Shabab, I think so, but the Hadoop’s site says |The user@ mailing
>>     list is the preferred mailing list for end-user questions and
>>     discussion.|So I am using the right mailing list.
>>
>>     Back to my problem, I think that this is a problem about HDFS
>>     security. But the strangest thing is that I have disabled it in
>>     |hdfs-site.xml| [1].
>>
>>     I think that this error happens when MapReduce is trying to write
>>     the job configuration files in HDFS.
>>
>>     I have set the username of the remote client in the mapreduce
>>     using the commands in [2].
>>
>>     Now, I am looking to the Netflix Geni to figure it out how they
>>     do it, but right now I still haven’t found a solution to submit a
>>     remote job using Java. If anyone have a hint, or advice, please
>>     tell me. I really don’t understand why I get this error.
>>
>>     [1]
>>
>>     |$ cat etc/hadoop/hdfs-site.xml
>>
>>     <property> <name>dfs.permissions</name> <value>false</value>
</property>
>>     |
>>
>>     [2]
>>
>>     |```
>>         In Namenode host.
>>
>>         $ sudo adduser xeon
>>         $ sudo adduser xeon ubuntu
>>     |
>>
>>     ```
>>
>>     On 05/18/2015 02:46 PM, Shahab Yunus wrote:
>>
>>>     I think that poster wanted to unsubscribe from the mailing list?
>>>
>>>     Gopy, if that is the case then please see this for
>>>     that:https://hadoop.apache.org/mailing_lists.html
>>>
>>>     Regards,
>>>     Shahab
>>>
>>>     On Mon, May 18, 2015 at 9:42 AM, xeonmailinglist-gmail
>>>     <xeonmailinglist@gmail.com <mailto:xeonmailinglist@gmail.com>>
>>>     wrote:
>>>
>>>         Why "Remove"?
>>>
>>>
>>>         On 05/18/2015 02:25 PM, Gopy Krishna wrote:
>>>>         REMOVE
>>>>
>>>>         On Mon, May 18, 2015 at 6:54 AM, xeonmailinglist-gmail
>>>>         <xeonmailinglist@gmail.com
>>>>         <mailto:xeonmailinglist@gmail.com>> wrote:
>>>>
>>>>             Hi,
>>>>
>>>>             I am trying to submit a remote job in Yarn MapReduce,
>>>>             but I can’t because I get the error [1]. I don’t have
>>>>             more exceptions in the other logs.
>>>>
>>>>             My Mapreduce runtime have 1 /ResourceManager/ and 3
>>>>             /NodeManagers/, and the HDFS is running properly (all
>>>>             nodes are alive).
>>>>
>>>>             I have looked to all logs, and I still don’t understand
>>>>             why I get this error. Any help to fix this? Is it a
>>>>             problem of the remote job that I am submitting?
>>>>
>>>>             [1]
>>>>
>>>>             |$ less logs/hadoop-ubuntu-namenode-ip-172-31-17-45.log
>>>>
>>>>             2015-05-18 10:42:16,570 DEBUG org.apache.hadoop.hdfs.StateChange:
*BLOCK* NameNode.addBlock: file /tmp/hadoop-yarn/staging/xeon/.staging/job_1431945660897_0001/job.split
>>>>             fileId=16394 for DFSClient_NONMAPREDUCE_-1923902075_14
>>>>             2015-05-18 10:42:16,570 DEBUG org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.getAdditionalBlock: /tmp/hadoop-yarn/staging/xeon/.staging/job_1431945660897_0001/job.
>>>>             split inodeId 16394 for DFSClient_NONMAPREDUCE_-1923902075_14
>>>>             2015-05-18 10:42:16,571 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to choose remote rack (location = ~/default-rack), fallback to lo
>>>>             cal rack
>>>>             org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:691)
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580)
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348)
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214)
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111)
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:126)
>>>>                      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1545)
>>>>                      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
>>>>                      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
>>>>             |
>>>>
>>>>             ​
>>>>
>>>>             -- 
>>>>             --
>>>>
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         Thanks & Regards
>>>>         Gopy
>>>>         Rapisource LLC
>>>>         Direct: 732-419-9663 <tel:732-419-9663>
>>>>         Fax: 512-287-4047 <tel:512-287-4047>
>>>>         Email: gopy@rapisource.com <mailto:gopy@rapisource.com>
>>>>         www.rapisource.com <http://www.rapisource.com>
>>>>         http://www.linkedin.com/in/gopykrishna
>>>>
>>>>         According to Bill S.1618 Title III passed by the 105th US
>>>>         Congress,this message is not considered as "Spam" as we
>>>>         have included the contact information.If you wish to be
>>>>         removed from our mailing list, please respond with "remove"
>>>>         in the subject field.We apologize for any inconvenience caused.
>>>
>>>         -- 
>>>         --
>>>
>>>
>>     ​
>>     -- 
>>     --
>     ​
>
>     -- 
>     --
>
>
​

-- 
--


Mime
View raw message