hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Rao <kris...@blinkbox.com>
Subject Re: Intermittent BindException during long MR jobs
Date Wed, 25 Mar 2015 17:03:54 GMT
Thanks for the responses. In our case the port is 0, and so from the link<http://wiki.apache.org/hadoop/BindException>
Ted mentioned it says that a collision is highly unlikely:

"If the port is "0", then the OS is looking for any free port -so the port-in-use and port-below-1024
problems are highly unlikely to be the cause of the problem."

I think load may be the culprit since the nodes will be heavily used during the times that
the exception occurs.

Is there anyway to set/increase the timeout for the call/connection attempt? In all cases
so far it seems to be on a call to delete a file in HDFS. I had a search through the HDFS
code base but couldn't see an obvious way to set a timeout, and couldn't see it being set.


Krishna


On 28 February 2015 at 15:20, Ted Yu <yuzhihong@gmail.com<mailto:yuzhihong@gmail.com>>
wrote:
Krishna:
Please take a look at:
http://wiki.apache.org/hadoop/BindException

Cheers

On Thu, Feb 26, 2015 at 10:30 PM, <hadoop.support@visolve.com<mailto:hadoop.support@visolve.com>>
wrote:
Hello Krishna,

Exception seems to be IP specific. It might be occurred due to unavailability of IP address
in the system to assign. Double check the IP address availability and run the job.

Thanks,
S.RagavendraGanesh
ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com<http://www.visolve.com>
email: services@visolve.com<mailto:services@visolve.com> | Phone: 408-850-2243<tel:408-850-2243>


From: Krishna Rao [mailto:krishnanjrao@gmail.com<mailto:krishnanjrao@gmail.com>]
Sent: Thursday, February 26, 2015 9:48 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>; user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Intermittent BindException during long MR jobs

Hi,

we occasionally run into a BindException causing long running jobs to occasionally fail.

The stacktrace is below.

Any ideas what this could be caused by?

Cheers,

Krishna


Stacktrace:
379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job Submission failed with
exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0<http://10.4.2.10:0>]
java.net.BindException: Cann
ot assign requested address; For more details see:  http://wiki.apache.org/hadoop/BindException)'
java.net.BindException: Problem binding to [back10/10.4.2.10:0<http://10.4.2.10:0>]
java.net.BindException: Cannot assign requested address; For more details see:  http://wiki.apache.org/hadoop/BindException
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)
        at org.apache.hadoop.ipc.Client.call(Client.java:1242)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at com.sun.proxy.$Proxy10.create(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)
        at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at com.sun.proxy.$Proxy11.create(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1376)
        at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)
        at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)
        at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:56)





Krishna Rao
Senior Development Engineer Lead
t: +44 (0)1865 747960
m:
blinkbox music - the easiest way to listen to the music you love, for free
www.blinkboxmusic.com


Mime
View raw message