hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Yang" <minghs...@gmail.com>
Subject Re: Slow reduce task
Date Mon, 01 Oct 2007 19:08:35 GMT
Answering my own question..

I deleted /tmp/hadoop-root on all nodes and re-format the namenodes
then it worked.

 Also when I kick off the job on the master node, then
the reduce task is stuck in the middle, and if the job is started on the
slave node, then it finished without any problem.

Any idea?

Thanks,

Ming


2007/10/1, Ming Yang <minghsien@gmail.com>:
> Checked the log and found the following message:
>
> 2007-10-01 13:50:24,832 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = dv1/127.0.0.1
> STARTUP_MSG:   args = []
> ************************************************************/
> 2007-10-01 13:50:25,030 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=DataNode, sessionId=null
> 2007-10-01 13:50:25,446 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException:
> Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode
> namespaceID = 1118441396;
> datanode namespaceID = 774485407
>         at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:294)
>         at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:138)
>         at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
>         at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:206)
>         at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1391)
>         at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1335)
>         at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1356)
>         at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1525)
>
> 2007-10-01 13:50:25,450 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at dv1/127.0.0.1
> ************************************************************/
>
> Seems that the datanode didn't start successfully.
> Should I re-format all the nodes? How to prevent it from happening again?
>
> Thank you,
>
> Ming
>
> 2007/10/1, Ted Dunning <tdunning@veoh.com>:
> >
> > You seem to have no data in your cluster.  I wouldn't think that would cause
> > the hang that you observed, but it does limit how useful the cluster is.
> >
> >
> > On 10/1/07 9:42 AM, "Ming Yang" <minghsien@gmail.com> wrote:
> >
> > > Below is the output form hadoop fsck / :
> > >
> > > Status: HEALTHY
> > >  Total size:    0 B
> > >  Total blocks:  0
> > >  Total dirs:    6
> > >  Total files:   0
> > >  Over-replicated blocks:        0
> > >  Under-replicated blocks:       0
> > >  Target replication factor:     2
> > >  Real replication factor:       0.0
> > >
> > >
> > > The filesystem under path '/' is HEALTHY.
> > >
> > > ************************
> > >
> > > I am also wondering that, according to Google's paper about MapReduce,
> > > if there's any node failure or not responding for a given amount of time,
> > > the master will reassign the job to the other nodes. Is it true in Hadoop's
> > > implementation? Since I didn't see any job reassignment when a job
> > > has been pending too long.
> > >
> > > Thanks,
> > >
> > > Ming
> > >
> > > 2007/10/1, Ted Dunning <tdunning@veoh.com>:
> > >>
> > >> What does [hadoop fsck /] show?
> > >>
> > >>
> > >> On 10/1/07 5:36 AM, "Ming Yang" <minghsien@gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am using hadoop 0.14.1 on Ubuntu 7.04 (kernel version 2.6.20)
> > >>> The Java version is 1.5.0.12. There are no failed tasks and no
> > >>> lost task trackers.. What I observed is the machine only finished
> > >>> part of the reduce tasks and became idle. Could the issue come
> > >>> from my HDFS since the status showed the transfer rate is so low?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Ming
> > >>>
> > >>>
> > >>> 2007/9/30, Arun C Murthy <arunc@yahoo-inc.com>:
> > >>>> Ming Yang,
> > >>>>
> > >>>> On Sun, Sep 30, 2007 at 01:13:07PM -0400, Ming Yang wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> I set up a 2-node Hadoop cluster, whose nodes are all in
> > >>>>> the same network and ran the 'grep' example. The map tasks
> > >>>>> were distributed among the two machines and ran without any
> > >>>>> problem. However, the reduce task, which is running at the
slave
> > >>>>> node, doesn't seem to finish and stops at 11%. I checked the
> > >>>>> reduce task tracker and it shows the following message:
> > >>>>>
> > >>>>> reduce > copy (5 of 15 at 0.00 MB/s) >
> > >>>>>
> > >>>>> Can anyone let me know where the problem comes from?
> > >>>>> and how to fix it? I really appreciate it!
> > >>>>>
> > >>>>
> > >>>> Could you provide details on the hadoop version, platform etc.?
Were there
> > >>>> any failed tasks, lost task-trackers?
> > >>>>
> > >>>> Arun
> > >>>>
> > >>>>> Thank you,
> > >>>>>
> > >>>>> Ming Yang
> > >>>>
> > >>
> > >>
> >
> >
>

Mime
View raw message