hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: problem starting cdh3b2 jobtracker
Date Fri, 06 Aug 2010 10:43:00 GMT
java.io.IOException: Cannot create toBeDeleted in /data1/mapred/local

This line points at the solution actually. In earlier versions of CDH
if the list of local mapred directories had false ones (like say the
jobtracker machine not having 2 disks like all the tasktracking
machines and it not being in the slaves list either), it used to
ignore it. Now it doesn't seem to and instead tries to operate things
upon it? Looks like a major bug Cloudera folks! Encountered this using
CDH3 +320. Not using my jobtracker machine to perform tasks as well.

It gets resolved after you validate the mapred local directory list on
the job tracker machine's config alone. However, this would lead to
issues with conf-syncing between nodes if it acts this way forever.

On Fri, Jul 2, 2010 at 8:32 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> We installed cdh3b2 0.20.2+320 and saw some strange error in jobtracker log:
>
> 2010-07-02 01:49:31,977 INFO org.apache.hadoop.mapred.JobTracker: JobTracker
> up at: 9001
> 2010-07-02 01:49:31,977 INFO org.apache.hadoop.mapred.JobTracker: JobTracker
> webserver: 50030
> 2010-07-02 01:49:31,988 WARN org.apache.hadoop.mapred.JobTracker: Error
> starting tracker: java.io.IOException: Cannot create toBeDeleted in
> /data1/mapred/local
>    at
> org.apache.hadoop.util.MRAsyncDiskService.<init>(MRAsyncDiskService.java:85)
>    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1688)
>    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:199)
>    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:191)
>    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3765)
>
> 2010-07-02 01:49:32,990 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
> configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> 2010-07-02 01:49:32,991 FATAL org.apache.hadoop.mapred.JobTracker:
> java.net.BindException: Problem binding to
> sjc1-hadoop0.sjc1.ciq.com/10.201.8.204:9001<http://sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:9001>:
> Address already in use
>    at org.apache.hadoop.ipc.Server.bind(Server.java:198)
>    at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:261)
>    at org.apache.hadoop.ipc.Server.<init>(Server.java:1043)
>    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:492)
>    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:454)
>    at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1628)
>    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:199)
>    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:191)
>    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3765)
> Caused by: java.net.BindException: Address already in use
>    at sun.nio.ch.Net.bind(Native Method)
>    at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>    at org.apache.hadoop.ipc.Server.bind(Server.java:196)
>    ... 8 more
>
> 2010-07-02 01:49:32,992 INFO org.apache.hadoop.mapred.JobTracker:
> SHUTDOWN_MSG:
>
> But 9001 wasn't used:
> [sjc1-hadoop0.sjc1:hadoop 25618]netstat -nta | grep 9001
> [sjc1-hadoop0.sjc1:hadoop 25619]netstat -nta | grep 9000
> tcp        0      0 10.201.8.204:9000           0.0.0.0:*
> LISTEN
> tcp        0      0 10.201.8.204:9000           10.201.8.214:4223
> ESTABLISHED
> tcp        0      0 10.201.8.204:9000           10.201.8.212:49074
> ESTABLISHED
> tcp        0      0 10.201.8.204:9000           10.201.8.206:11910
> ESTABLISHED
> tcp        0      0 10.201.8.204:9000           10.201.8.210:62611
> ESTABLISHED
> tcp        0      0 10.201.8.204:9000           10.201.8.213:1299
> ESTABLISHED
> tcp        0      0 10.201.8.204:9000           10.201.8.205:9756
> ESTABLISHED
> tcp        0      0 10.201.8.204:9000           10.201.8.207:59207
> ESTABLISHED
>
> Here is output from ifconfig:
> bond0     Link encap:Ethernet  HWaddr 00:30:48:60:53:94
>          inet addr:10.201.8.204  Bcast:10.201.8.255  Mask:255.255.255.0
>          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:351496605 errors:0 dropped:1015 overruns:0 frame:0
>          TX packets:178144953 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:119420730164 (111.2 GiB)  TX bytes:120002123131 (111.7
> GiB)
>
> eth0      Link encap:Ethernet  HWaddr 00:30:48:60:53:94
>          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>          RX packets:351496605 errors:0 dropped:1015 overruns:0 frame:0
>          TX packets:178144953 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:119420730164 (111.2 GiB)  TX bytes:120002123131 (111.7
> GiB)
>          Interrupt:161
>
> eth1      Link encap:Ethernet  HWaddr 00:30:48:60:53:94
>          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
>          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>          Interrupt:169
>
> Has anyone encountered similar issue ?
>



-- 
Harsh J
www.harshj.com

Mime
View raw message