Mailing-List: contact user-help@flume.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flume.apache.org
Received-SPF: pass (nike.apache.org: domain of cwneal@gmail.com designates
 209.85.212.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <5048257B.6090602@cyberagent.co.jp>
References: 
 <CAM1Zi5EOkXiF2r4=sGND13jVRRYX1YdAci4+t_0wt+Y43bQbWg@mail.gmail.com>
	<CAHUddLN8MkWgn8QMPu9S3r98Hy6sHTUnue9TmxsY=uLctPGSUg@mail.gmail.com>
	<5045DA44.8000400@cyberagent.co.jp>
	<5048257B.6090602@cyberagent.co.jp>
Date: Thu, 6 Sep 2012 09:35:26 -0500
Message-ID: 
 <CAM1Zi5FkVvW4LYcRXmOJXwje37t0BYuAxurMY3LXHFujWoBwnw@mail.gmail.com>
Subject: Re: Failover Processor + Load Balanced Processor?
From: Chris Neal <cwneal@gmail.com>
To: user@flume.apache.org
Content-Type: multipart/alternative; boundary=f46d043bd75800234004c909676f

--f46d043bd75800234004c909676f
Content-Type: text/plain; charset=ISO-8859-1

Nice!  Thanks :)  Will take a look.

On Wed, Sep 5, 2012 at 11:24 PM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

>  Since there was no response to this, I set up a separate ticket  at
> https://issues.apache.org/jira/browse/FLUME-1541 and implemented it as a
> SinkSelector for the LoadBalancingSinkProcessor.
>
> Review can be found at https://reviews.apache.org/r/6939/
>
> Chris: if you're interested you may want to give this a poke, see if it
> fulfills your needs. The only change in configuration needed is to change
> the selector type from "round_robin" to "round_robin_backoff"
>
>
> On 09/04/2012 07:39 PM, Juhani Connolly wrote:
>
> I'm thinking of working on this(adding backoff semantics to the load
> balancing processor)
>
> The ticket FLUME-1488 however refers to the load balancing rpc client(or
> is it just poorly worded/unclear?). If it is in fact a separate ticket I'll
> file one for this
>
> Anyway, I was  interested in hearing thoughts on approach. I'd have liked
> to do it within the framework of the LoadBalancingSinkProcessor by adding a
> new Selector, however as it is now, it the processor provides no feedback
> to the selectors about whether sinks are working or not, so this can't work.
>
> This leaves two choices: write a new SinkProcessor or modify the
> SinkSelector interface to give it a couple of callbacks that the processor
> calls to inform the selector of trouble. This shouldn't really be a problem
> even if people have written their own selectors so long as they are
> extending AbstractSinkSelector which can stub the callbacks.
>
> Thoughts?
>
> On 08/18/2012 02:01 AM, Arvind Prabhakar wrote:
>
> Hi,
>
>  FYI - the load balancing sink processor does support simple failover
> semantics. The way it works is that if a sink is down, it will proceed to
> the next sink in the group until all sinks are exhausted. The failover sink
> processor on the other hand does complex failure handling and back-off such
> as blacklisting sinks that repeatedly fail etc. The issue [1] tracks
> enhancing this processor to support backoff semantics.
>
>  The one issue with your configuration that I could spot by a quick
> glance is that you are adding your active sinks to both the sink groups.
> This does not really work and the configuration subsystem simply flags the
> second inclusion as a problem and ignores it. By design, a sink can either
> be on its own or in one explicit sink group.
>
>  [1] https://issues.apache.org/jira/browse/FLUME-1488
>
>  Regards,
> Arvind Prabhakar
>
>  On Fri, Aug 17, 2012 at 8:59 AM, Chris Neal <cwneal@gmail.com> wrote:
>
>> Hi all.
>>
>>  The User Guide talks about the various types of Sink Processors, but
>> doesn't say whether they can be aggregated together.  A Failover Processor
>> that moves between 1..n sinks is great, as is a Load Balancer Processor
>> that moves between 1..n sinks, but what is the best would be an agent that
>> can utilize both a Failover Processor AND a Load Balancer Processor!
>>
>>  I've created a configuration which I believe supports this, and the
>> Agent starts up and processes events, but I wanted to ping this group to
>> make sure that this configuration is really doing what I think it is doing
>> behind the scenes.
>>
>>  Comments?
>>
>>  # Define the sources, sinks, and channels for the agent
>> agent.sources = avro-instance_1-source avro-instance_2-source
>> agent.channels = memory-agent-channel
>> agent.sinks = avro-hdfs_1-sink avro-hdfs_2-sink
>> agent.sinkgroups = failover-sink-group lb-sink-group
>>
>>  # Bind sources to channels
>> agent.sources.avro-instance_1-source.channels = memory-agent-channel
>> agent.sources.avro-instance_2-source.channels = memory-agent-channel
>>
>>  # Define sink group for failover
>> agent.sinkgroups.failover-sink-group.sinks = avro-hdfs_1-sink
>> avro-hdfs_2-sink
>> agent.sinkgroups.failover-sink-group.processor.type = failover
>> agent.sinkgroups.failover-sink-group.processor.priority.avro-hdfs_1-sink
>> = 5
>> agent.sinkgroups.failover-sink-group.processor.priority.avro-hdfs_2-sink
>> = 10
>> agent.sinkgroups.failover-sink-group.processor.maxpenalty = 10000
>>
>>  # Define sink group for load balancing
>> agent.sinkgroups = lb-sink-group
>> agent.sinkgroups.group1.sinks = avro-hdfs_1-sink avro-hdfs_2-sink
>> agent.sinkgroups.group1.processor.type = load_balance
>> agent.sinkgroups.group1.processor.selector = round_robin
>>
>>  # Bind sinks to channels
>> agent.sinks.avro-hdfs_1-sink.channel = memory-agent-channel
>> agent.sinks.avro-hdfs_2-sink.channel = memory-agent-channel
>>
>>  # avro-instance_1-source properties
>> agent.sources.avro-instance_1-source.type = exec
>> agent.sources.avro-instance_1-source.command = tail -F /somedir/Trans.log
>> agent.sources.avro-instance_1-source.restart = true
>> agent.sources.avro-instance_1-source.batchSize = 100
>>
>>  # avro-instance_2-source properties
>> agent.sources.avro-instance_2-source.type = exec
>> agent.sources.avro-instance_2-source.command = tail -F
>> /somedir/UDXMLTrans.log
>> agent.sources.avro-instance_2-source.restart = true
>> agent.sources.avro-instance_2-source.batchSize = 100
>>
>>  # avro-hdfs_1-sink properties
>> agent.sinks.avro-hdfs_1-sink.type = avro
>> agent.sinks.avro-hdfs_1-sink.hostname = hdfshost1.domin.com
>> agent.sinks.avro-hdfs_1-sink.port = 10000
>>
>>  # avro-hdfs_2-sink properties
>> agent.sinks.avro-hdfs_2-sink.type = avro
>> agent.sinks.avro-hdfs_2-sink.hostname = hdfshost2.domain.com
>>  agent.sinks.avro-hdfs_2-sink.port = 10000
>>
>>  # memory-agent-channel properties
>> agent.channels.memory-agent-channel.type = memory
>> agent.channels.memory-agent-channel.capacity = 20000
>> agent.channels.memory-agent-channel.transactionCapacity = 100
>>
>>  Thanks!
>>
>
>
>
>

--f46d043bd75800234004c909676f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Nice! =A0Thanks :) =A0Will take a look.<br><br><div class=3D"gmail_quote">O=
n Wed, Sep 5, 2012 at 11:24 PM, Juhani Connolly <span dir=3D"ltr">&lt;<a hr=
ef=3D"mailto:juhani_connolly@cyberagent.co.jp" target=3D"_blank">juhani_con=
nolly@cyberagent.co.jp</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    <div>Since there was no response to this, I
      set up a separate ticket=A0 at
     =20
      <a href=3D"https://issues.apache.org/jira/browse/FLUME-1541" target=
=3D"_blank">https://issues.apache.org/jira/browse/FLUME-1541</a>
      and implemented it as a SinkSelector for the
      LoadBalancingSinkProcessor.<br>
      <br>
      Review can be found at <a href=3D"https://reviews.apache.org/r/6939/"=
 target=3D"_blank">https://reviews.apache.org/r/6939/</a><br>
      <br>
      Chris: if you&#39;re interested you may want to give this a poke, see
      if it fulfills your needs. The only change in configuration needed
      is to change the selector type from &quot;round_robin&quot; to
      &quot;round_robin_backoff&quot;<div><div class=3D"h5"><br>
      <br>
      On 09/04/2012 07:39 PM, Juhani Connolly wrote:<br>
    </div></div></div><div><div class=3D"h5">
    <blockquote type=3D"cite">
     =20
      <div>I&#39;m thinking of working on
        this(adding backoff semantics to the load balancing processor)<br>
        <br>
        The ticket FLUME-1488 however refers to the load balancing rpc
        client(or is it just poorly worded/unclear?). If it is in fact a
        separate ticket I&#39;ll file one for this<br>
        <br>
        Anyway, I was=A0 interested in hearing thoughts on approach. I&#39;=
d
        have liked to do it within the framework of the
        LoadBalancingSinkProcessor by adding a new Selector, however as
        it is now, it the processor provides no feedback to the
        selectors about whether sinks are working or not, so this can&#39;t
        work.<br>
        <br>
        This leaves two choices: write a new SinkProcessor or modify the
        SinkSelector interface to give it a couple of callbacks that the
        processor calls to inform the selector of trouble. This
        shouldn&#39;t really be a problem even if people have written their
        own selectors so long as they are extending AbstractSinkSelector
        which can stub the callbacks.<br>
        <br>
        Thoughts?<br>
        <br>
        On 08/18/2012 02:01 AM, Arvind Prabhakar wrote:<br>
      </div>
      <blockquote type=3D"cite">Hi,
        <div><br>
        </div>
        <div>FYI - the load balancing sink processor does support simple
          failover semantics. The way it works is that if a sink is
          down, it will proceed to the next sink in the group until all
          sinks are exhausted. The failover sink processor on the other
          hand does complex failure handling and back-off such as
          blacklisting sinks that repeatedly fail etc. The issue [1]
          tracks enhancing this processor to support backoff semantics.</di=
v>
        <div><br>
        </div>
        <div>The one issue with your configuration that I could spot by
          a quick glance is that you are adding your active sinks to
          both the sink groups. This does not really work and the
          configuration subsystem simply flags the second inclusion as a
          problem and ignores it. By design, a sink can either be on its
          own or in one explicit sink group.=A0</div>
        <div><br>
        </div>
        <div>[1]=A0<a href=3D"https://issues.apache.org/jira/browse/FLUME-1=
488" target=3D"_blank">https://issues.apache.org/jira/browse/FLUME-1488</a>=
</div>
        <div><br>
        </div>
        <div>Regards,</div>
        <div>Arvind Prabhakar</div>
        <div><br>
          <div class=3D"gmail_quote"> On Fri, Aug 17, 2012 at 8:59 AM,
            Chris Neal <span dir=3D"ltr">&lt;<a href=3D"mailto:cwneal@gmail=
.com" target=3D"_blank">cwneal@gmail.com</a>&gt;</span>
            wrote:<br>
            <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"> Hi all.
              <div><br>
              </div>
              <div>The User Guide talks about the various types of Sink
                Processors, but doesn&#39;t say whether they can be
                aggregated together. =A0A Failover Processor that moves
                between 1..n sinks is great, as is a Load Balancer
                Processor that moves between 1..n sinks, but what is the
                best would be an agent that can utilize both a Failover
                Processor AND a Load Balancer Processor!</div>
              <div><br>
              </div>
              <div>I&#39;ve created a configuration which I believe support=
s
                this, and the Agent starts up and processes events, but
                I wanted to ping this group to make sure that this
                configuration is really doing what I think it is doing
                behind the scenes.</div>
              <div><br>
              </div>
              <div>Comments?</div>
              <div><br>
              </div>
              <div>
                <div># Define the sources, sinks, and channels for the
                  agent</div>
                <div>agent.sources =3D avro-instance_1-source
                  avro-instance_2-source</div>
                <div>agent.channels =3D memory-agent-channel</div>
                <div>agent.sinks =3D avro-hdfs_1-sink avro-hdfs_2-sink</div=
>
                <div>agent.sinkgroups =3D failover-sink-group
                  lb-sink-group</div>
                <div><br>
                </div>
                <div># Bind sources to channels</div>
                <div>agent.sources.avro-instance_1-source.channels =3D
                  memory-agent-channel</div>
                <div>agent.sources.avro-instance_2-source.channels =3D
                  memory-agent-channel</div>
                <div><br>
                </div>
                <div># Define sink group for failover</div>
                <div>agent.sinkgroups.failover-sink-group.sinks =3D
                  avro-hdfs_1-sink avro-hdfs_2-sink</div>
                <div>agent.sinkgroups.failover-sink-group.processor.type
                  =3D failover</div>
                <div>agent.sinkgroups.failover-sink-group.processor.priorit=
y.avro-hdfs_1-sink

                  =3D 5</div>
                <div>agent.sinkgroups.failover-sink-group.processor.priorit=
y.avro-hdfs_2-sink

                  =3D 10</div>
                <div>agent.sinkgroups.failover-sink-group.processor.maxpena=
lty

                  =3D 10000</div>
                <div><br>
                </div>
                <div># Define sink group for load balancing</div>
                <div>agent.sinkgroups =3D lb-sink-group</div>
                <div>agent.sinkgroups.group1.sinks =3D avro-hdfs_1-sink
                  avro-hdfs_2-sink</div>
                <div>agent.sinkgroups.group1.processor.type =3D
                  load_balance</div>
                <div>agent.sinkgroups.group1.processor.selector =3D
                  round_robin</div>
                <div><br>
                </div>
                <div># Bind sinks to channels</div>
                <div>agent.sinks.avro-hdfs_1-sink.channel =3D
                  memory-agent-channel</div>
                <div>agent.sinks.avro-hdfs_2-sink.channel =3D
                  memory-agent-channel</div>
                <div><br>
                </div>
                <div># avro-instance_1-source properties</div>
                <div>agent.sources.avro-instance_1-source.type =3D exec</di=
v>
                <div>agent.sources.avro-instance_1-source.command =3D tail
                  -F /somedir/Trans.log</div>
                <div>agent.sources.avro-instance_1-source.restart =3D true<=
/div>
                <div>agent.sources.avro-instance_1-source.batchSize =3D
                  100</div>
                <div><br>
                </div>
                <div># avro-instance_2-source properties</div>
                <div>agent.sources.avro-instance_2-source.type =3D exec</di=
v>
                <div>agent.sources.avro-instance_2-source.command =3D tail
                  -F /somedir/UDXMLTrans.log</div>
                <div>agent.sources.avro-instance_2-source.restart =3D true<=
/div>
                <div>agent.sources.avro-instance_2-source.batchSize =3D
                  100</div>
                <div><br>
                </div>
                <div># avro-hdfs_1-sink properties</div>
                <div>agent.sinks.avro-hdfs_1-sink.type =3D avro</div>
                <div>agent.sinks.avro-hdfs_1-sink.hostname =3D <a href=3D"h=
ttp://hdfshost1.domin.com" target=3D"_blank">hdfshost1.domin.com</a></div>
                <div>agent.sinks.avro-hdfs_1-sink.port =3D 10000</div>
                <div><br>
                </div>
                <div># avro-hdfs_2-sink properties</div>
                <div>agent.sinks.avro-hdfs_2-sink.type =3D avro</div>
                <div>agent.sinks.avro-hdfs_2-sink.hostname =3D <a href=3D"h=
ttp://hdfshost2.domain.com" target=3D"_blank">hdfshost2.domain.com</a></div=
>
                <div> agent.sinks.avro-hdfs_2-sink.port =3D 10000</div>
                <div><br>
                </div>
                <div># memory-agent-channel properties</div>
                <div>agent.channels.memory-agent-channel.type =3D memory</d=
iv>
                <div>agent.channels.memory-agent-channel.capacity =3D
                  20000</div>
                <div>agent.channels.memory-agent-channel.transactionCapacit=
y

                  =3D 100</div>
              </div>
              <div><br>
              </div>
              <div>Thanks!</div>
            </blockquote>
          </div>
          <br>
        </div>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br>

--f46d043bd75800234004c909676f--