flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhaskar V. Karambelkar" <bhaska...@gmail.com>
Subject Re: Proper documentation for setting up sink groups
Date Thu, 23 Aug 2012 22:55:02 GMT
Some really insightful explanations Hari, thanks for the insight.
Btw, I do feel all this should be in flume user guide for the greater good
of mankind :)



On Thu, Aug 23, 2012 at 6:45 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> Please see inline.
>
> --
> Hari Shreedharan
>
>
> On Thursday, August 23, 2012 at 3:28 PM, Bhaskar V. Karambelkar wrote:
>
> > My replies in line. and thanks for the detailed explanations.
> >
> > On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <
> hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> > >
> > > Please see inline.
> > >
> > > --
> > > Hari Shreedharan
> > >
> > >
> > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote:
> > >
> > > > Hi Hari,
> > > > Yes I did read the whole guide end to end.
> > > > But I still have doubts
> > > >
> > > > The fact that multiple sinks can feed from the same channel is news
> to me. I don't see it explicitly mentioned in the docs,
> > > > so i guess I assumed wrongly, that only one sink can feed from a
> channel.
> > > >
> > > > a)Can you explain in detail , how having multiple sinks taking
> events from one channel, is useful in a "fast source slow sinks" scenario ?
> > > When multiple sinks read events from the same channel, you essentially
> have as many threads taking events out, since each sink has at least one
> thread. So if your source is dumping n events per second into the channel,
> and your sink can only process 1 event per second, you could have n sinks
> to read n events per second (this is hypothetical - your hardware and your
> OS will restrict performance when the number of threads starts growing a
> lot). A channel returns an event only once, how many ever sinks are taking
> from the channel. Each event if removed and committed will never be given
> to another sink. If there is a rollback, it is just like the event was
> never taken, and a different sink will be able to take and commit it.
> > >
> >
> >
> > OK this makes sense.
> >
> > > >
> > > > b) Also if I read your explanation below correctly there are 3
> possible cases
> > > >
> > > > 1) multiple sinks feeding from a single channel , with the default
> sink processor this will be like a multiplexing channel with all sinks
> getting all the events that come in the channel.
> > > No, every time a take() is called from the channel, the channel will
> return that event only to one sink. So each sink will get a unique
> event(unless rollbacks happen - in which case the channel will put the
> events back into the channel and a different sink might be able to pick it
> up).
> > >
> >
> >
> > So this situation is exactly like a load balancing one, as events are
> somewhat equally distributed between all sinks ?
> Not necessarily equally distributed. Sinks poll the channel to take the
> event. If a sink is slow in polling channels then it will get fewer events,
> and if a channel is faster then that will get more events, since they are
> running on different threads.
> >
> > > >
> > > > 2) multiple sinks feeding from a single channel , with fail_over
> sink processor, only one sink will get the events at a give time, with
> flume failing over to next available sink in case the first one fails ?
> > > A sink group essentially treats n sinks like one, and depending on the
> criteria, will select one sink to process the next event from the channel.
> In case of failover, sinks are picked in order of priority - and when one
> sink fails, the next one is picked.
> > >
> >
> >
> > OK this makes sense.
> >
> > > >
> > > > 3) multiple sinks feeding from a single channel, with load balancing
> processor, with all sinks getting events in a round-robin/random order.
> > > No, each sink will get a different event. One sink processes one event
> and the next one picked will process the next event from the channel.
> >
> >
> > Yes that's exactly what I meant, I didn't imply that all sinks get all
> events, but the events are distributed more or less equally among the sinks
> in round-robin/random order.
> > As I said about this looks almost like #1, except here you have a
> control over the selection algorithm (round-robin/random)
>
> Not just that you have control, this will not depend on the sink's
> performance because all sinks are run from the same thread. So slower sinks
> can slow down the whole process since only one sink reads from the channel
> at any point in time. Think of a load balancing sink selector as a loop
> which picks up one sink and passes the event to that one. Since there is
> only one thread per sink group, having one sink group is often slower than
> having multiple sinks reading from the same channel.
> >
> > > > Is this a correct assumption ? I am aware of #2 and #3, not sure
> about #1.
> > > >
> > > > On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan <
> hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com) (mailto:
> hshreedharan@cloudera.com)> wrote:
> > > > > Did you read this:
> http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
> > > > >
> > > > > That explains how to use sink groups. Also there is nothing wrong
> with multiple sinks taking events from one channel. This is an especially
> useful configuration if you have a very fast source and much slower sinks.
> > > > >
> > > > >
> > > > > Hari
> > > > >
> > > > > --
> > > > > Hari Shreedharan
> > > > >
> > > > >
> > > > > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar
> wrote:
> > > > >
> > > > > > The sink group document doesn't mention anything about how
> > > > > > to hook up sink groups to the rest of the config in order to
> work.
> > > > > >
> > > > > > e.g. under normal circumstances one channel is linked with one
> sink.
> > > > > >
> > > > > > But for failover sink group , looks like both the sinks should
> be hooked up to the same channel,
> > > > > > but this is not mentioned any where.
> > > > > >
> > > > > > Similarly, what exactly needs to be done for load balancing
sink
> ?
> > > > > >
> > > > > > thanks
>
>
>

Mime
View raw message