hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@gmail.com>
Subject Re: NameNode failover procedure
Date Wed, 30 Jul 2008 18:53:18 GMT
I agree that NFS could have problems. We should ideally solve the
namenode HS issue without depending on NFS.

You can run the secondary namenode process on the same machine as the
primary. The primary namenode and the secondary namenode woudl
typically require the same amount of hep memory. If you allocate 8GB
of RAM to the primary that you would ideally allocate another 8GB of
RAM to the secondary namenode.

-dhruba

On Wed, Jul 30, 2008 at 11:48 AM, Himanshu Sharma <himsha@yahoo-inc.com> wrote:
>
> The NFS seems to be having problem as NFS locking causes namenode hangup.
> Can't be there any other way, say if namenode starts writing synchronously
> to secondary namenode apart from local directories, then in case of namenode
> failover, we can start the primary namenode process on secondary namenode
> and the latest checkpointed fsimage is already there on secondary namenode.
>
> This also raises a fundamental question, whether we can run secondary
> namenode process on the same node as primary namenode process without any
> out of memory / heap exceptions ? Also ideally what should be the memory
> size of primary namenode if alone and when with secondary namenode process ?
>
>
> Andrzej Bialecki wrote:
>>
>> Dhruba Borthakur wrote:
>>> A good way to implement failover is to make the Namenode log transactions
>>> to
>>> more than one directory, typically a local directory and a NFS mounted
>>> directory. The Namenode writes transactions to both directories
>>> synchronously.
>>>
>>> If the Namenode machine dies, copy the fsimage and fsiedits from the NFS
>>> server and you will have recovered *all* committed transactions.
>>>
>>> The SecondaryNamenode pulls the fsimage and fsedits once every configured
>>> period, typically ranging from a few minutes to an hour. If you use the
>>> image from the SecondaryNamenode, you might lose the last few minutes of
>>> transactions.
>>
>> That's a good idea. But then, what's the purpose of running a secondary
>> namenode, if it can't guarantee that the data loss is minimal ???
>> Should't edits be written synchronously to a secondary namenode, and
>> fsimage updated synchronously whenever a primary namenode performs a
>> checkpoint?
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>   ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/NameNode-failover-procedure-tp11711842p18740089.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
>

Mime
View raw message