hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: How to rebuild the shared edits directory
Date Wed, 25 Jul 2012 18:51:10 GMT
Hi Jeff,

I don't know the HP offerings very well myself, but I know some of our
customers are successfully using lower end NetApp devices.

You should also be aware that work on the NAS-less shared storage is
well under way: HDFS-3077. So if your timeline is more than a few
months out to production, you may consider waiting for it to get your
HA setup running.

-Todd

On Tue, Jul 24, 2012 at 12:05 PM, Jeff Whiting <jeffw@qualtrics.com> wrote:
> Todd or anyone who knows,
>
> I'm reviving an old thread because we are collocating into a data center
> rather than just using the cloud.  You mentioned "We currently require the
> NFS direcory to be highly available itself. This is achievable with even
> pretty inexpensive NAS devices from your vendor of choice."    What hardware
> would you suggest that would give us an HA filer?  Specifically we are going
> all HP in the colo.
>
>  I've looked around and was unable to find any suggestions.  The docs just
> say "high-quality dedicated NAS appliance."  Any suggestions would be great!
>
> https://ccp.cloudera.com/display/CDH4DOC/HDFS+High+Availability+Hardware+Configuration
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419
>
> Thanks,
> ~Jeff
>
>
> On 5/8/2012 6:49 PM, Todd Lipcon wrote:
>>
>> Hi Jeff,
>>
>> Check out HDFS-3077. We'll probably need the most help when it comes
>> time to do testing. Any testing you can do on the current HA solution,
>> non-ideal as it may be, is also immensely valuable. For example, if
>> you can reproduce the case where it didn't exit upon loss of shared
>> edits, that would also be a bug which would hit the quorum-based
>> solution.
>>
>> Thanks
>> -Todd
>>
>> On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <jeffw@qualtrics.com> wrote:
>>>
>>> Thanks for being patient and listening to my rants.  I'm excited to see
>>> hdfs
>>> continue to move forward.  If the organization I'm working for was
>>> willing
>>> spend some resources to help speed this process up, where should be start
>>> looking?  I'm sure there are quite a few jiras on these issues.
>>>
>>> Most of what we've done with the hadoop eco system has been zookeeper and
>>> hbase related.
>>>
>>> Thanks,
>>> ~Jeff
>>>
>>>
>>> On 5/8/2012 2:46 PM, Todd Lipcon wrote:
>>>>
>>>> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<jeffw@qualtrics.com>
>>>> wrote:
>>>>>
>>>>> It seems the NN was originally written with the assumption that disks
>>>>> fail
>>>>> and stuff happens.  Hence the ability to have multiple directories
>>>>> store
>>>>> your NN data even though each directory is mostly likely redundant /
>>>>> HA.
>>>>>
>>>>> [start rant]
>>>>>
>>>>> My opinion is that it is a step backwards that the shared edits wasn't
>>>>> written with the same assumptions.  If any one problem can take out
>>>>> your
>>>>> cluster then it isn't HA.  So allowing  a single nfs failure taking
>>>>> down
>>>>> your cluster and saying make nfs HA, just seems to move the HA problem
>>>>> not
>>>>> solve it.  I would expect a true HA solution to be completely self
>>>>> contained
>>>>> within the hadoop ecosystem.  All machines fail...eventually and it
>>>>> needs
>>>>> to
>>>>> be planned for.  At a minimum a failure of the shared edits should only
>>>>> disable fail over and provide a recovery mechanism; Ideally the NN
>>>>> should
>>>>> have been rewritten to be a cluster (similar to zookeeper or ceph) to
>>>>> enable
>>>>> HA.
>>>>>
>>>>> [end rant]
>>>>
>>>> Like I said earlier in the thread, work is already under way on this
>>>> and should be complete within a number of months.
>>>>
>>>> In many practical deployments, what we have already can provide
>>>> complete HA. In others, like the AWS example you mentioned, we need a
>>>> bit more, and we're working on it. Hang on a bit longer and it will be
>>>> good to go.
>>>>
>>>> -Todd
>>>>
>>>>> Sorry for the rant.  I just really want to see HDFS become complete HA
>>>>> system without caveats.
>>>>>
>>>>> ~Jeff
>>>>>
>>>>>
>>>>> On 5/8/2012 11:44 AM, Todd Lipcon wrote:
>>>>>>
>>>>>> On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook
>>>>>> <nathanielc@qualtrics.com>    wrote:
>>>>>>>
>>>>>>> We ran the initializeSharedEdits command and it didn't have any
>>>>>>> effect, but that my be because of the weird state we got it in.
>>>>>>>
>>>>>>> So help me understand: I was under the assumption that if shared
>>>>>>> edits
>>>>>>> went away you would lose the ability to failover and that is
it. The
>>>>>>> active namenode would still function but would not failover and
all
>>>>>>> standy namenodes would not try to become active. Is this correct?
>>>>>>
>>>>>> Unfortunately that's not the case. If you lose shared edits, your
>>>>>> cluster should shut down. We currently require the NFS direcory to
be
>>>>>> highly available itself. This is achievable with even pretty
>>>>>> inexpensive NAS devices from your vendor of choice.
>>>>>>
>>>>>> The reason for this behavior is as follows: if the active node loses
>>>>>> access to the mount, it's unable to distinguish whether the mount
>>>>>> itself died or if the node just had a local issue which broke the
>>>>>> mount. Imagine for example that the NFS client had a bug which caused
>>>>>> the mount to go away. Then, you'd continue running for quite some
time
>>>>>> without writing to shared edits. If your NN then crashed, a failover
>>>>>> would cause you to revert to an old version of the namespace, and
>>>>>> you'd have a case of permanent data loss due to divergence of the
>>>>>> image before and after failover.
>>>>>>
>>>>>> There's work under way to remove this restriction which should be
>>>>>> available for general use some time this summer or early fall, if
I
>>>>>> had to take a guess on timeline.
>>>>>>
>>>>>>> If
>>>>>>> it is the case that namenodes quit when they lose connection
to the
>>>>>>> shared edits dir than doesn't the shared edits becomes the new
single
>>>>>>> point of failure?
>>>>>>
>>>>>> Yes, but it's an easy one to resolve. Most of our customers already
>>>>>> have a NAS device in their datacenter, which has dual heads, dual
>>>>>> PDUs, etc, and at least 5 9s of uptime. This HA setup is basically
the
>>>>>> same as you see in most enterprise HA systems which rely on shared
>>>>>> storage.
>>>>>>
>>>>>>> Unfortunately we have cleared the logs from this test but we
could
>>>>>>> try
>>>>>>> to reproduce it.
>>>>>>
>>>>>> That would be great, thanks!
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>>> On Tue, May 8, 2012 at 10:28 AM, Todd Lipcon<todd@cloudera.com>
>>>>>>>   wrote:
>>>>>>>>
>>>>>>>> On Tue, May 8, 2012 at 7:46 AM, Nathaniel
>>>>>>>> Cook<nathanielc@qualtrics.com>
>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>> We have be working with an HA hdfs cluster, testing several
>>>>>>>>> failover
>>>>>>>>> scenarios.  We have a small cluster of 4 machines spun
up for
>>>>>>>>> testing.
>>>>>>>>> We run a namenode on two of the machines and hosted an
nfs share on
>>>>>>>>> the third for the shared edits directory. The fourth
machine is
>>>>>>>>> just
>>>>>>>>> a
>>>>>>>>> datanode. We configured the cluster for automatic failover
using
>>>>>>>>> ZKFC.
>>>>>>>>> We can start and stop the namenodes with no problems,
failover
>>>>>>>>> happens
>>>>>>>>> as expected. Then we tested breaking the shared edits
directory. We
>>>>>>>>> stopped the nfs share and then reenabled it. This caused
the loss
>>>>>>>>> of
>>>>>>>>> a
>>>>>>>>> few edits.
>>>>>>>>
>>>>>>>> Really? What mount options are you using on your NFS mount?
>>>>>>>>
>>>>>>>> The active NN should abort immediately if the shared edits
dir
>>>>>>>> disappears. Do you have logs available from your NNs during
this
>>>>>>>> time?
>>>>>>>>
>>>>>>>>> This had no effect, as expected, on the namenodes, and
the
>>>>>>>>> cluster functioned normally.
>>>>>>>>
>>>>>>>> On the contrary, I'd expect the NN to bail out on the next
edit
>>>>>>>> (since
>>>>>>>> it has no place to reliably fsync it)
>>>>>>>>
>>>>>>>>> We stopped the standby namenode and tried
>>>>>>>>> to start it again, it would not start because of the
missing edits.
>>>>>>>>> No
>>>>>>>>> matter what we tried we could not rebuild the shared
edits
>>>>>>>>> directory
>>>>>>>>> and thus get the second namenode back online. In this
state the
>>>>>>>>> hdfs
>>>>>>>>> cluster continued to function but it was no longer an
HA cluster.
>>>>>>>>> To
>>>>>>>>> get the cluster back in HA mode we had to reformat the
namenode
>>>>>>>>> data
>>>>>>>>> with the shared edits. In this case how do you rebuild
the shared
>>>>>>>>> edits data so you can get the cluster back to an HA mode?
>>>>>>>>
>>>>>>>> It sounds like something went wrong with the facility that's
>>>>>>>> supposed
>>>>>>>> to make the active NN crash if shared edits go away. The
logs will
>>>>>>>> help.
>>>>>>>>
>>>>>>>> To answer your question, though, you can run the
>>>>>>>> "initializeSharedEdits" process again to re-initialize that
edits
>>>>>>>> dir.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> -Todd
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -Nathaniel Cook
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Jeff Whiting
>>>>> Qualtrics Senior Software Engineer
>>>>> jeffw@qualtrics.com
>>>>>
>>>>
>>> --
>>> Jeff Whiting
>>> Qualtrics Senior Software Engineer
>>> jeffw@qualtrics.com
>>>
>>
>>
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message