hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: Highly Available HDFS ???
Date Wed, 25 Mar 2009 20:51:25 GMT
We are running a real-timeish cluster that is configured as two overlapping
hdfs clusters. The namenodes run on two different machines but the datanodes
run on the same set of slaves machines. (Each slave machine actually runs
two datanode instances.) The entire storage space is shared between the two
clusters and, at the same time, provides higher availability because there
are two namenodes. The downside is that there are two separate namespaces,
and the application has to handle this.

thanks,
dhruba

On Wed, Mar 25, 2009 at 12:25 PM, Sanjay Radia <sradia@yahoo-inc.com> wrote:

>
> On Mar 25, 2009, at 12:07 PM, Sangmin Lee wrote:
>
>  Hi all,
>>
>> I am wondering if there is any effort or plans on HA (Highly Available)
>> HDFS
>> out there.
>> Currently, NameNode is single point of failure and recovery requires human
>> intervention.
>>
>>  Many (and probably most) users of hadoop are using hdfs for batch
> processing.
> As a result HA for name node has not received as high a priority as other
> projects since
> batch jobs can wait while the name node is restarting.
> Clearly this is not acceptable for non-batch use of hdfs.
>
>
> Suresh has a rough prototype of HA'ed Namenode using linux HA that he is
> planning put in contrib one of these days (it is low priority
> background task for him).
>
> Sorry that I don't have a better answer.
>
> sanjay
>
>
>
>> In addition, the recovered NameNode may not same as one before the
>> failure.
>> Is there any plans or ongoing effort to improve this?
>>
>> Thanks,
>> Sangmin
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message