falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Yu <johnyu0...@gmail.com>
Subject Re: On configuring two source clusters due to colo requirement
Date Thu, 17 Jul 2014 21:30:52 GMT
Just done.
https://issues.apache.org/jira/browse/FALCON-511

Thanks!
John


2014-07-15 22:22 GMT-07:00 Shwetha GS <shwetha.gs@inmobi.com>:

> Hi John,
>
> We didn't have any usecase of this kind of replication, hence we didn't
> think about it. Its a valid usecase, can you file a jira for tracking this?
>
> Thanks,
> Shwetha
>
>
> On Wed, Jul 16, 2014 at 1:00 AM, John Yu <johnyu0520@gmail.com> wrote:
>
> > Hey Satish,
> >
> > Thanks for your reply!
> >
> > I can see how setting up that way would definitely work.
> > Also, it is probably technically more correct as well, as data generated
> by
> > different processes should be considered different.
> >
> > However, we are thinking along the lines of data discovery, in which a
> > critical dataset might be computed on different colos simultaneously for
> > both DR and load balancing purposes.  In this scenario, we would somehow
> > like the end users to know that feed1 and feed2 are logically the same
> > data, and they are free to pick one to use.
> >
> > Just wondering whether it make sense to support multiple sources and
> > multiple targets without specifying partition (and maybe the target
> cluster
> > have to specify the order of sources from which to copy).  Also I am
> > guessing that this "multiple sources and multiple targets without
> > specifying partition" requirement must have came up before, and what was
> > the thought process that went behind not supporting it in the end.
> >
> > Thanks a lot!
> > John
> >
> >
> > 2014-07-14 21:34 GMT-07:00 Satish Mittal <satish.mittal@inmobi.com>:
> >
> > > Hi,
> > >
> > > Given that both ETL clusters are producing the same data-set
> independent
> > of
> > > each other and the aim is to replicate the data-set within colo (to
> avoid
> > > any cross-colo data movement), you could simply have 2 instances of the
> > > same feed, one per colo:
> > >
> > > feed1:
> > > <cluster name=“colo1ETL type="source">
> > > <cluster name=“colo1A” type="target">
> > >
> > > feed2:
> > > <cluster name=“colo2ETL type="source">
> > > <cluster name=“colo2A” type="target">
> > >
> > > The 1st error was coming since multiple source replication was
> configured
> > > (which needs partition expressions to be specified). Also that
> > > configuration would have ended up moving data across colos, which is
> > > against your desired goal.
> > >
> > > Thanks,
> > > Satish
> > >
> > >
> > > On Mon, Jul 14, 2014 at 11:52 PM, John Yu <johnyu0520@gmail.com>
> wrote:
> > >
> > > > Hey all,
> > > >
> > > > We currently have the following use case:
> > > > Colo1 has 1 ETL cluster (Colo1-ETL) and 1 adhoc cluster (Colo1-A)
> > > > Colo2 has 1 ETL cluster (Colo2-ETL) and 1 adhoc cluster (Colo2-A)
> > > >
> > > > Due to the bandwidth constraint between the two colo's, we are
> thinking
> > > of
> > > > having the 2 ETL clusters perform the same computation to generate
> the
> > > same
> > > > dataset, and have the 2 adhoc clusters pull from their respective
> > > > colo-local ETL cluster.
> > > >
> > > > What would be a good way to configure this feed?
> > > >
> > > > I've tried the following:
> > > > <cluster name=“colo1ETL type="source">
> > > > <cluster name="colo2ETL" type="source">
> > > > <cluster name=“colo1A” type="target">
> > > > <cluster name="colo2A” type="target">
> > > > Error: Partition expression has to be specified for cluster colo1ETL
> as
> > > > there are more than one source clusters
> > > >
> > > > <cluster name=“colo1ETL”>
> > > > <cluster name="colo2ETL”>
> > > > <cluster name=“colo1A” type="target">
> > > > <cluster name="colo2A” type="target">
> > > > Error: Feed: pve-intermediate should have atleast one source cluster
> > > > defined
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > John
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
> >
> >
> > --
> > 余守中  John Yu (Yu, Shoou-Jong)
> > Mobile: 650-691-3314
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>



-- 
余守中  John Yu (Yu, Shoou-Jong)
Mobile: 650-691-3314

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message