hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakhi Khatwani <rakhi.khatw...@gmail.com>
Subject Re: Setting up another machine as secondary node
Date Wed, 27 May 2009 12:18:32 GMT
Hi,
      Thanks for the suggestions. but my scenario is a little different.
i am doin a POC on namenode failover.

i have a 5 cluster node setup in which one acts as a master, 3 acts as
slaves and the last one, the secondary node.

i start my hadoop dfs, write something into it... and later kill my
namenode. (tryin to produce a real worls scenario where my namenode fails
due to some hardware error).

so my aim is to start the secondary node as the primary m/c.
so tht the dfs is intact (by copyin the checkpoint info)
and all the slave pcs becoming the slaves of the secondary namenode now.

1. Can this be achieved without shuttin down the cluster?... i have read
this somewhere... but coudnt achieve it.

2. Whats the step by step instruction to achieve it?.. i hv google it, got a
lot of different opinions n m totally confused now.

Thanks,
Raakhi




On Tue, May 26, 2009 at 11:27 PM, Konstantin Shvachko <shv@yahoo-inc.com>wrote:

> Hi Rakhi,
>
> This is because your name-node is trying to -importCheckpoint from a
> directory,
> which is locked by secondary name-node.
> The secondary node is also running in your case, right?
> You should use -importCheckpoint as the last resort, when name-node's
> directories
> are damaged.
> In regular case you start name-node with
> ./hadoop-daemon.sh start namenode
>
> Thanks,
> --Konstantin
>
>
> Rakhi Khatwani wrote:
>
>> Hi,
>>       I followed the instructions suggested by you all. but i still
>> come across this exception when i use the following command:
>> ./hadoop-daemon.sh start namenode -importCheckpoint
>>
>> the exception is as follows:
>> 2009-05-26 14:43:48,004 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG:   host = germapp/192.168.0.1
>> STARTUP_MSG:   args = [-importCheckpoint]
>> STARTUP_MSG:   version = 0.19.0
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
>> 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
>> ************************************************************/
>> 2009-05-26 14:43:48,147 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=NameNode, port=44444
>> 2009-05-26 14:43:48,154 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>> germapp/192.168.0.1:44444
>> 2009-05-26 14:43:48,160 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>> 2009-05-26 14:43:48,166 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>> Initializing NameNodeMeterics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2009-05-26 14:43:48,316 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> fsOwner=ithurs,ithurs
>> 2009-05-26 14:43:48,317 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> supergroup=supergroup
>> 2009-05-26 14:43:48,317 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=true
>> 2009-05-26 14:43:48,343 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing FSNamesystemMetrics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2009-05-26 14:43:48,347 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatusMBean
>> 2009-05-26 14:43:48,455 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Storage directory
>> /tmp/hadoop-ithurs/dfs/name is not formatted.
>> 2009-05-26 14:43:48,455 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
>> 2009-05-26 14:43:48,457 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage
>> /tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
>> 2009-05-26 14:43:48,460 ERROR
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
>> initialization failed.
>> java.io.IOException: Cannot lock storage
>> /tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
>>        at
>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
>>        at
>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:290)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:208)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:194)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
>> 2009-05-26 14:43:48,464 INFO org.apache.hadoop.ipc.Server: Stopping
>> server on 44444
>> 2009-05-26 14:43:48,466 ERROR
>> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
>> Cannot lock storage /tmp/hadoop-ithurs/dfs/namesecondary. The
>> directory is already locked.
>>        at
>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
>>        at
>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:290)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:208)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:194)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
>>
>> 2009-05-26 14:43:48,468 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down NameNode at germapp/192.168.0.1
>> ************************************************************/
>>
>> any pointers/suggestions?
>> Thanks,
>> Raakhi
>>
>> On 5/20/09, Aaron Kimball <aaron@cloudera.com> wrote:
>>
>>> See this regarding instructions on configuring a 2NN on a separate
>>> machine
>>> from the NN:
>>>
>>> http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
>>>
>>> - Aaron
>>>
>>> On Thu, May 14, 2009 at 10:42 AM, Koji Noguchi
>>> <knoguchi@yahoo-inc.com>wrote:
>>>
>>>  Before 0.19, fsimage/edits were on the same directory.
>>>> So whenever secondary finishes checkpointing, it copies back the fsimage
>>>> while namenode still kept on writing to the edits file.
>>>>
>>>> Usually we observed some latency on the namenode side during that time.
>>>>
>>>> HADOOP-3948 would probably help after 0.19 or later.
>>>>
>>>> Koji
>>>>
>>>> -----Original Message-----
>>>> From: Brian Bockelman [mailto:bbockelm@cse.unl.edu]
>>>> Sent: Thursday, May 14, 2009 10:32 AM
>>>> To: core-user@hadoop.apache.org
>>>> Subject: Re: Setting up another machine as secondary node
>>>>
>>>> Hey Koji,
>>>>
>>>> It's an expensive operation - for the secondary namenode, not the
>>>> namenode itself, right?  I don't particularly care if I stress out a
>>>> dedicated node that doesn't have to respond to queries ;)
>>>>
>>>> Locally we checkpoint+backup fairly frequently (not 5 minutes ...
>>>> maybe less than the default hour) due to sheer paranoia of losing
>>>> metadata.
>>>>
>>>> Brian
>>>>
>>>> On May 14, 2009, at 12:25 PM, Koji Noguchi wrote:
>>>>
>>>>  The secondary namenode takes a snapshot
>>>>>> at 5 minute (configurable) intervals,
>>>>>>
>>>>>>  This is a bit too aggressive.
>>>>> Checkpointing is still an expensive operation.
>>>>> I'd say every hour or even every day.
>>>>>
>>>>> Isn't the default 3600 seconds?
>>>>>
>>>>> Koji
>>>>>
>>>>> -----Original Message-----
>>>>> From: jason hadoop [mailto:jason.hadoop@gmail.com]
>>>>> Sent: Thursday, May 14, 2009 7:46 AM
>>>>> To: core-user@hadoop.apache.org
>>>>> Subject: Re: Setting up another machine as secondary node
>>>>>
>>>>> any machine put in the conf/masters file becomes a secondary namenode.
>>>>>
>>>>> At some point there was confusion on the safety of more than one
>>>>> machine,
>>>>> which I believe was settled, as many are safe.
>>>>>
>>>>> The secondary namenode takes a snapshot at 5 minute (configurable)
>>>>> intervals, rebuilds the fsimage and sends that back to the namenode.
>>>>> There is some performance advantage of having it on the local machine,
>>>>> and
>>>>> some safety advantage of having it on an alternate machine.
>>>>> Could someone who remembers speak up on the single vrs multiple
>>>>> secondary
>>>>> namenodes?
>>>>>
>>>>>
>>>>> On Thu, May 14, 2009 at 6:07 AM, David Ritch <david.ritch@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  First of all, the secondary namenode is not a what you might think a
>>>>>> secondary is - it's not failover device.  It does make a copy of
the
>>>>>> filesystem metadata periodically, and it integrates the edits into
>>>>>> the
>>>>>> image.  It does *not* provide failover.
>>>>>>
>>>>>> Second, you specify its IP address in hadoop-site.xml.  This is where
>>>>>>
>>>>> you
>>>>>
>>>>>> can override the defaults set in hadoop-default.xml.
>>>>>>
>>>>>> dbr
>>>>>>
>>>>>> On Thu, May 14, 2009 at 9:03 AM, Rakhi Khatwani
>>>>>>
>>>>> <rakhi.khatwani@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>   I wanna set up a cluster of 5 nodes in such a way that
>>>>>>> node1 - master
>>>>>>> node2 - secondary namenode
>>>>>>> node3 - slave
>>>>>>> node4 - slave
>>>>>>> node5 - slave
>>>>>>>
>>>>>>>
>>>>>>> How do we go about that?
>>>>>>> there is no property in hadoop-env where i can set the ip-address
>>>>>>>
>>>>>> for
>>>>>
>>>>>> secondary name node.
>>>>>>>
>>>>>>> if i set node-1 and node-2 in masters, and when we start dfs,
in
>>>>>>>
>>>>>> both the
>>>>>
>>>>>> m/cs, the namenode n secondary namenode processes r present. but
i
>>>>>>>
>>>>>> think
>>>>>
>>>>>> only node1 is active.
>>>>>>> n my namenode fail over operation fails.
>>>>>>>
>>>>>>> ny suggesstions?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Rakhi
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Alpha Chapters of my book on Hadoop are available
>>>>> http://www.apress.com/book/view/9781430219422
>>>>> www.prohadoopbook.com a community for Hadoop Professionals
>>>>>
>>>>
>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message