hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Au <bill.w...@gmail.com>
Subject Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)
Date Wed, 04 Feb 2009 15:38:42 GMT
I have been looking into this some more by looking a the output of dfsadmin
-report during the decommissioning process.  After a node has been
decommissioned, dfsadmin -report shows that the node is in the
Decommissioned state.  The web interface dfshealth.jsp shows it as a dead
node.  After I removed the decommissioned node from the exclude file and run
the refreshNodes command, the web interface continues to show it as a dead
node but dfsadmin -report shows the node to be in service.  After I restart
HDFS dfsadmin -report shows the correct information again.

If I restart HDFS leaving the decommissioned node in the exlude, the web
interface shows it as a dead node and dfsadmin -report shows it to be in
service.  But after I remove it from the exclude file and run the
refreshNodes command, both the web interface and dfsadmin -report show the
correct information.

It looks to me I should only remove the decommissioned node from the exclude
file after restarting HDFS.

I would still like to see the web interface report any decommissioned node
as decommissioned rather than dead as with the case with dfsadmin -report.
I am willing to work on a patch for this.  Before I start, does anyone know
if this is already in the works?

Bill

On Mon, Feb 2, 2009 at 5:00 PM, Bill Au <bill.w.au@gmail.com> wrote:

> It looks like the behavior is the same with 0.18.2 and 0.19.0.  Even though
> I removed the decommissioned node from the exclude file and run the
> refreshNode command, the decommissioned node still show up as a dead node.
> What I did noticed is that if I leave the decommissioned node in the exclude
> and restart HDFS, the node will show up as a dead node after restart.  But
> then if I remove it from the exclude file and run the refreshNode command,
> it will disappear from the status page (dfshealth.jsp).
>
> So it looks like I will have to stop and start the entire cluster in order
> to get what I want.
>
> Bill
>
>
> On Thu, Jan 29, 2009 at 5:40 PM, Bill Au <bill.w.au@gmail.com> wrote:
>
>> Not sure why but this does not work for me.  I am running 0.18.2.  I ran
>> hadoop dfsadmin -refreshNodes after removing the decommissioned node from
>> the exclude file.  It still shows up as a dead node.  I also removed it from
>> the slaves file and ran the refresh nodes command again.  It still shows up
>> as a dead node after that.
>>
>> I am going to upgrade to 0.19.0 to see if it makes any difference.
>>
>> Bill
>>
>>
>> On Tue, Jan 27, 2009 at 7:01 PM, paul <paulgnyc@gmail.com> wrote:
>>
>>> Once the nodes are listed as dead, if you still have the host names in
>>> your
>>> conf/exclude file, remove the entries and then run hadoop dfsadmin
>>> -refreshNodes.
>>>
>>>
>>> This works for us on our cluster.
>>>
>>>
>>>
>>> -paul
>>>
>>>
>>> On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bill.w.au@gmail.com> wrote:
>>>
>>> > I was able to decommission a datanode successfully without having to
>>> stop
>>> > my
>>> > cluster.  But I noticed that after a node has been decommissioned, it
>>> shows
>>> > up as a dead node in the web base interface to the namenode (ie
>>> > dfshealth.jsp).  My cluster is relatively small and losing a datanode
>>> will
>>> > have performance impact.  So I have a need to monitor the health of my
>>> > cluster and take steps to revive any dead datanode in a timely fashion.
>>>  So
>>> > is there any way to altogether "get rid of" any decommissioned datanode
>>> > from
>>> > the web interace of the namenode?  Or is there a better way to monitor
>>> the
>>> > health of the cluster?
>>> >
>>> > Bill
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message