hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Yang" <minghs...@gmail.com>
Subject Re: Slow reduce task
Date Mon, 01 Oct 2007 17:56:14 GMT
Checked the log and found the following message:

2007-10-01 13:50:24,832 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = dv1/127.0.0.1
STARTUP_MSG:   args = []
************************************************************/
2007-10-01 13:50:25,030 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null
2007-10-01 13:50:25,446 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException:
Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode
namespaceID = 1118441396;
datanode namespaceID = 774485407
        at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:294)
        at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:138)
        at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
        at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:206)
        at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1391)
        at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1335)
        at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1356)
        at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1525)

2007-10-01 13:50:25,450 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at dv1/127.0.0.1
************************************************************/

Seems that the datanode didn't start successfully.
Should I re-format all the nodes? How to prevent it from happening again?

Thank you,

Ming

2007/10/1, Ted Dunning <tdunning@veoh.com>:
>
> You seem to have no data in your cluster.  I wouldn't think that would cause
> the hang that you observed, but it does limit how useful the cluster is.
>
>
> On 10/1/07 9:42 AM, "Ming Yang" <minghsien@gmail.com> wrote:
>
> > Below is the output form hadoop fsck / :
> >
> > Status: HEALTHY
> >  Total size:    0 B
> >  Total blocks:  0
> >  Total dirs:    6
> >  Total files:   0
> >  Over-replicated blocks:        0
> >  Under-replicated blocks:       0
> >  Target replication factor:     2
> >  Real replication factor:       0.0
> >
> >
> > The filesystem under path '/' is HEALTHY.
> >
> > ************************
> >
> > I am also wondering that, according to Google's paper about MapReduce,
> > if there's any node failure or not responding for a given amount of time,
> > the master will reassign the job to the other nodes. Is it true in Hadoop's
> > implementation? Since I didn't see any job reassignment when a job
> > has been pending too long.
> >
> > Thanks,
> >
> > Ming
> >
> > 2007/10/1, Ted Dunning <tdunning@veoh.com>:
> >>
> >> What does [hadoop fsck /] show?
> >>
> >>
> >> On 10/1/07 5:36 AM, "Ming Yang" <minghsien@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am using hadoop 0.14.1 on Ubuntu 7.04 (kernel version 2.6.20)
> >>> The Java version is 1.5.0.12. There are no failed tasks and no
> >>> lost task trackers.. What I observed is the machine only finished
> >>> part of the reduce tasks and became idle. Could the issue come
> >>> from my HDFS since the status showed the transfer rate is so low?
> >>>
> >>> Thanks,
> >>>
> >>> Ming
> >>>
> >>>
> >>> 2007/9/30, Arun C Murthy <arunc@yahoo-inc.com>:
> >>>> Ming Yang,
> >>>>
> >>>> On Sun, Sep 30, 2007 at 01:13:07PM -0400, Ming Yang wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I set up a 2-node Hadoop cluster, whose nodes are all in
> >>>>> the same network and ran the 'grep' example. The map tasks
> >>>>> were distributed among the two machines and ran without any
> >>>>> problem. However, the reduce task, which is running at the slave
> >>>>> node, doesn't seem to finish and stops at 11%. I checked the
> >>>>> reduce task tracker and it shows the following message:
> >>>>>
> >>>>> reduce > copy (5 of 15 at 0.00 MB/s) >
> >>>>>
> >>>>> Can anyone let me know where the problem comes from?
> >>>>> and how to fix it? I really appreciate it!
> >>>>>
> >>>>
> >>>> Could you provide details on the hadoop version, platform etc.? Were
there
> >>>> any failed tasks, lost task-trackers?
> >>>>
> >>>> Arun
> >>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Ming Yang
> >>>>
> >>
> >>
>
>

Mime
View raw message