hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Kincaid Williams <disc...@uw.edu>
Subject Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration
Date Thu, 31 Jul 2014 19:35:31 GMT
Hi Jing,

Thanks for the response. I will try this out, and file an Apache jira.

Best,

Colin Williams


On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.com> wrote:

> Hi Colin,
>
>     I guess currently we may have to restart almost all the
> daemons/services in order to swap out a standby NameNode (SBN):
>
> 1. The current active NameNode (ANN) needs to know the new SBN since in
> the current implementation the SBN tries to send rollEditLog RPC request to
> ANN periodically (thus if a NN failover happens later, the original ANN
> needs to send this RPC to the correct NN).
> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> Look at the code in BPOfferService:
>
>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> IOException {
>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>     for (BPServiceActor actor : bpServices) {
>       oldAddrs.add(actor.getNNSocketAddress());
>     }
>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>
>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>       // Keep things simple for now -- we can implement this at a later
> date.
>       throw new IOException(
>           "HA does not currently support adding a new standby to a running
> DN. " +
>           "Please do a rolling restart of DNs to reconfigure the list of
> NNs.");
>     }
>   }
>
> 3. If you're using automatic failover, you also need to update the
> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> gracefully fencing by sending RPC to the other NN.
> 4. Looks like we do not need to restart JournalNodes for the new SBN but I
> have not tried before.
>
>     Thus in general we may still have to restart all the services (except
> JNs) and update their configurations. But this may be a rolling restart
> process I guess:
> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> of all the DN to update their configurations
> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> configuration. The new SBN should become active.
>
>     I have not tried the upper steps, thus please let me know if this
> works or not. And I think we should also document the correct steps in
> Apache. Could you please file an Apache jira?
>
> Thanks,
> -Jing
>
>
>
> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu>
> wrote:
>
>> Hello,
>>
>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>> believe the steps to achieve this would be something similar to:
>>
>> Use the Bootstrap standby command to prep the replacment standby. Or
>> rsync if the command fails.
>>
>> Somehow update the datanodes, so they push the heartbeat / journal to the
>> new standby
>>
>> Update the xml configuration on all nodes to reflect the replacment
>> standby.
>>
>> Start the replacment standby
>>
>> Use some hadoop command to refresh the datanodes to the new NameNode
>> configuration.
>>
>> I am not sure how to deal with the Journal switch, or if I am going about
>> this the right way. Can anybody give me some suggestions here?
>>
>>
>> Regards,
>>
>> Colin Williams
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Mime
View raw message