hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From john smith <js1987.sm...@gmail.com>
Subject Re: Map phase hanging for wordcount example
Date Tue, 06 Sep 2011 09:53:34 GMT
Yep , it works .. I just synced /etc/hosts files and I didnt change other
configs and now its working fine. Thanks for the help Harsh. Sorry for
spamming without checking my TTlogs properly.

Also 1 more doubt . Any idea why its scheduling only a single reduce? I have
2 datanodes and I am expecting it to run 2 reducers (data size of 500MB) .

Any hints?


On Tue, Sep 6, 2011 at 3:17 PM, Harsh J <harsh@cloudera.com> wrote:

> John,
>
> Yes, looks like your slave nodes aren't able to properly resolve some
> hostnames. Hadoop requires a sane network setup to work properly.
> Also, yes, you need to use a hostname for your fs.default.name and
> other configs to the extent possible.
>
> The easiest way is to keep a properly synchronized /etc/hosts file.
>
> For example, it may look like so, on all machines:
>
> 127.0.0.1 localhost.localdomain localhost
> 192.168.0.1 master.hadoop master
> 192.168.0.2 slave3.hadoop slave3
> (and so on…)
>
> (This way master can resolve slaves, and slaves can resolve master. If
> you have the time, setup a DNS, its the best thing to do.)
>
> Then, in core-site.xml you'll need:
>
> fs.default.name = hdfs://master
>
> And in mapred-site.xml:
>
> mapred.job.tracker = master:8021
>
> That should do it, so long as the slave hosts can freely access the
> master hosts (no blockage of ports via firewall and such).
>
> On Tue, Sep 6, 2011 at 3:05 PM, john smith <js1987.smith@gmail.com> wrote:
> > Hey My TT logs show this ,
> >
> > 2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker:
> Caught
> > exception: java.net.UnknownHostException: unknown host: rip-pc.local
> > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
> > at org.apache.hadoop.ipc.Client.getConnection(Client.java:853)
> > at org.apache.hadoop.ipc.Client.call(Client.java:723)
> > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> > at $Proxy5.getProtocolVersion(Unknown Source)
> > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> > at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
> > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
> > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> > ^C at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >
> >
> > May be some error in configs ?? I am using IPs in the conf files ..should
> I
> > put entries in /etc/hosts files?
> >
> > On Tue, Sep 6, 2011 at 3:00 PM, john smith <js1987.smith@gmail.com>
> wrote:
> >
> >> Hi Harsh,
> >>
> >> My jt log : http://pastebin.com/rXAEeDkC
> >>
> >> I have some startup exceptions (which doesn't matter much I guess) but
> the
> >> tail indicates that its locating the splits correctly and then it hangs
> !
> >>
> >> Any idea?
> >>
> >> Thanks
> >>
> >>
> >> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >>> I'd check the tail of JobTracker logs after a submit is done to see if
> >>> an error/warn there is causing this. And then dig further on
> >>> why/what/how.
> >>>
> >>> Hard to tell what your problem specifically is without logs :)
> >>>
> >>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <js1987.smith@gmail.com>
> >>> wrote:
> >>> > Hi Folks,
> >>> >
> >>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test
> >>> data
> >>> > with replication factor 3 (around 400MB data). However when I run
> >>> wordcount
> >>> > example , it hangs at map 0%.
> >>> >
> >>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount
> /test_data
> >>> > /out2
> >>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to
> >>> process :
> >>> > 2
> >>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job:
> >>> job_201109061248_0002
> >>> > 11/09/06 13:07:29 INFO mapred.JobClient:  map 0% reduce 0%
> >>> >
> >>> > TTs and DNs are running fine on my slaves . I see them running when
I
> >>> run
> >>> > jps command.
> >>> >
> >>> >
> >>> > Can any one help me out on this? Any idea why this would happen? I
am
> >>> > totally clueless as nothing shows up in logs too.!
> >>> >
> >>> > Thanks,
> >>> > jS
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message