ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: Blueprints - RCO - Related question.
Date Tue, 15 Mar 2016 04:54:40 GMT
Fine with flag, but prefer to use rco as default though.  Since the default
behavior is only recently changed in the last 6 months.  It would be better
to restore to the v1 behavior.

regards,
Eric

On Mon, Mar 14, 2016 at 5:55 PM, Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
wrote:

> I have created a placeholder JIRA documenting the feature and if we all
> agree let's do it.
> https://issues.apache.org/jira/browse/AMBARI-15417
>
> Thanks,
> Bhuvnesh Chaudhary
> Email: bchau <bchaudhary@gopivotal.com>dhary@pivotal.io
> Desk: +1-650-846-1696 | Mobile: +1-973-906-6976
>
> On Mon, Mar 14, 2016 at 11:17 AM, Alejandro Fernandez <
> afernandez@hortonworks.com> wrote:
>
> > I agree configuring this with a flag is ideal.
> >
> > Thanks,
> > Alejandro
> >
> > From: Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
> > Date: Monday, March 14, 2016 at 11:06 AM
> > To: Ambari <dev@ambari.apache.org>
> > Cc: Sumit Mohanty <smohanty@hortonworks.com>, Alejandro Fernandez <
> > afernandez@hortonworks.com>
> > Subject: Re: Blueprints - RCO - Related question.
> >
> > Thank you very much Robert for the detailed explanation. It helps
> > to understand the background.
> >
> > Regarding HAWQ to capitalize on retry: We can potentially do some
> > tweaks to verify if HAWQ has been initialized or not according to the
> > current behavior, and change the way of doing init so that it can utilize
> > retry.
> > Currently, it goes for retry but it has certain pre-requisites which
> fails
> > after the first
> > failed installed attempt and retry is also not successul.
> > Will have to investigate on it.
> >
> > Regarding alternatives:
> > Was the option to put a flag in blueprints enabling / disabling RCO
> > considered ? Say, by default use_rco is true, and if someone want's
> > to override the behavior they can override that in blueprint.
> >
> > As quoted by Eric in the above email, in some cases, the retry can also
> > cause
> > increase in the amount of time required due to
> > 1) number of retries before it completes successfully, or it fails
> > completely
> > 2) Before retry there has to be some cleanup steps which may be
> > required for a service (for hawq currently), services must incorporate
> > that logic.
> >
> > Also with RCO, the sequence of startup is predictable and all the
> > dependencies will be met.
> >
> > So probably, making use of rco configurable in blueprints satisfies both
> > the worlds
> > who want to use rco vs not use it.
> > Your thoughts ?
> >
> >
> >
> >
> > Thanks,
> > Bhuvnesh Chaudhary
> > Email: bchau <bchaudhary@gopivotal.com>dhary@pivotal.io
> > Desk: +1-650-846-1696 | Mobile: +1-973-906-6976
> >
> > On Mon, Mar 14, 2016 at 9:18 AM, Eric Yang <eric818@gmail.com> wrote:
> >
> >> We have a use case where a service depends on Sqoop, Hive Metastore,
> HBase
> >> Client, Hadoop Client on a worker node.  We found that Hadoop Client is
> >> sometimes not yet installed when our service installation has already
> >> started.  This looks like a big problem for our use case.  Is there a
> way
> >> to keep RCO by using a flag?  Parallel install with retries is Chef and
> >> Puppet approach of configuring distributed loosely coupled service that
> >> has
> >> no strong tight relationship between nodes.  It doesn't solve the
> problem
> >> of virtual services where a component depends on availability of other
> >> services.  We had been scratching our heads on this since August last
> >> year.  It is good to know the problem so we can work out the kinks.
> >>
> >> If component is also monster size that it takes 60 minutes to download
> and
> >> install.  We can bump up retries for Hadoop client to very large number,
> >> but does this mean that while the monster size component is retrying,
> >> Hadoop clients maybe installed in parallel, hence second attempt of the
> >> monster component could succeed?  It seems like in this use case, the
> new
> >> optimization doesn't improve installation time because Ambari needs 120
> >> minutes to complete second retry of installation frequently.
> >>
> >> regards,
> >> Eric
> >>
> >> On Mon, Mar 14, 2016 at 6:38 AM, Robert Nettleton <
> >> rnettleton@hortonworks.com> wrote:
> >>
> >> > Hi Bhuvnesh,
> >> >
> >> > You are correct.  The Blueprints deployment mechanism in Ambari no
> >> longer
> >> > relies on Role-command ordering to install or start components across
> >> the
> >> > cluster.
> >> >
> >> > This change to Blueprints was actually implemented in Ambari 2.1.0, so
> >> it
> >> > has been around for several releases now.  The new approach was
> >> implemented
> >> > to improve the performance times of cluster deployments, and provide
> >> better
> >> > support for dynamic scaling of clusters.
> >> >
> >> > That being said, the new deployment mechanism does indeed remove the
> >> > guarantee of ordering, which can potentially cause some problems for
> >> > certain types of clusters.  There were also changes implemented on the
> >> > Ambari Agent side to mitigate this problem or ordering.  The
> >> ambari-agent
> >> > will now retry INSTALL and START operations if those operations happen
> >> to
> >> > fail.  The START operation is probably the most relevant in your case,
> >> and
> >> > is also the operation that does show the ordering issues you’ve
> >> mentioned
> >> > in some deployments.
> >> >
> >> > The idea is that the ambari-agent retries should help to resolve any
> >> > issues with services starting in an unexpected order.
> >> >
> >> > This ambari-agent feature is on by default, but can be configured in a
> >> > more fine-grained fashion by setting some properties in “cluster-env”
> in
> >> > your Blueprint or Cluster Creation Template.
> >> >
> >> > Unfortunately, this is not documented very well, but the three
> >> properties
> >> > in question are set by default in the BlueprintConfigurationProcessor
> in
> >> > the following method:
> >> >
> >> >
> >> >
> >>
> org.apache.ambari.server.controller.internal.BlueprintConfigurationProcessor#setRetryConfiguration
> >> >
> >> > The properties set in this method allow control over the types of
> >> > operations that are retried, the max number of retries attempted, and
> >> the
> >> > maximum amount of time that the agent should attempt a retry.
> >> >
> >> > We’ve seen many clusters using this new approach, and have not run
> into
> >> > that many problems with respect to ordering.
> >> >
> >> > One possible problem we’ve seen is in a small number of components
> that
> >> > launch services as a background command.  In that case, the
> ambari-agent
> >> > cannot detect that a retry is required, and so cannot attempt a
> restart
> >> of
> >> > a failed service.  This problem can usually be resolved with
> >> > component-specific retries.
> >> >
> >> > I don’t know much about the HAWQ component, but I would expect that
> >> > customizing the retry settings may help this problem.  Do the HAWQ
> >> > components implement retry attempts when booting up?
> >> >
> >> > Hope this helps.
> >> >
> >> > Thanks,
> >> > Bob
> >> >
> >> >
> >> >
> >> >
> >> > On Mar 11, 2016, at 7:18 PM, Alejandro Fernandez <
> >> > afernandez@hortonworks.com> wrote:
> >> >
> >> > > +others who have more insight into BluePrints
> >> > >
> >> > > On 3/11/16, 3:24 PM, "Bhuvnesh Chaudhary" <bchaudhary@pivotal.io>
> >> wrote:
> >> > >
> >> > >> Hello Sebastian, Alejandro, Andrew,
> >> > >>
> >> > >> Referring to the discussion on RB:
> >> https://reviews.apache.org/r/43948
> >> > >> <https://reviews.apache.org/r/43948/#review120537>, it appears
> that
> >> > while
> >> > >> deploying clusters using Blueprints, RCO is not honored. Please
> >> confirm
> >> > if
> >> > >> this understanding is correct.
> >> > >>
> >> > >> While running internal test suites for HAWQ, we deploy the clusters
> >> > using
> >> > >> BP, and we need a specific order in which the HAWQ components
must
> be
> >> > >> initialized / started.
> >> > >>
> >> > >> "HAWQ Standby" component should be initialized after "HAWQ Master"
> >> > >> component as it has to copy the contents from HAWQ Master. However,
> >> > since
> >> > >> RCO is not honored, we often come across issues as HAWQ Standby
> >> start /
> >> > >> initialization before HAWQ Master.
> >> > >>
> >> > >> Could you please let us know if there any work already going on
for
> >> > >> bringing in RCO dependency for Blueprints, if not is there any
> other
> >> > >> alternative which can be used to enforce the dependency locally,
or
> >> > >> something else which you suggest.
> >> > >>
> >> > >> Thanks,
> >> > >> Bhuvnesh Chaudhary
> >> > >> Email: bchau <bchaudhary@gopivotal.com>dhary@pivotal.io
> >> > >> Desk: +1-650-846-1696 | Mobile: +1-973-906-6976
> >> > >
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message