flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan" <hshreedha...@cloudera.com>
Subject RE: All channels in an agent get slower after a channel is full
Date Fri, 14 Nov 2014 19:27:02 GMT
What we have seen is the if the keep-alive is set to 0, there is no waiting on the semaphore
causing threads to fail and cause high CPU burn and also lot of network activity due to the
failure messages being sent. Keeping a low keep-alive makes the situation much better. I would
recommend not keeping the keep-alive to zero.


Thanks,
Hari

On Fri, Nov 14, 2014 at 2:38 AM, null <j.guilmard@accenture.com> wrote:

> Yes Hari, that’s exactly my point: by default, whatever are the channels (Memory or
File), one channel filling up will slow down the associated Source, and therefore any other
channel associated to it. (It will also impact by extension, any client sending event to this
source, as the source will acknowledge the events slower).
> The higher the keep-alive is (default 3sec), the bigger the global impact will be.
> Vincentius might reduce this impact by lowering the keep-alive of his channels to 1 second
(lowest possible value).
> What do you think of a future evolution enabling an optional configuration of the channels,
in order to change the keep-alive time unit (default Seconds for non reg), so that some users
like Vincentius and I could put the Keep-live to something like 100ms, with channels “Optionnal”
for example ?
> Also could you elaborate on the conséquences of having a keep-alive = 0 ? I understand
that in a Channel full situation, the tryAquire will immediately fail, without waiting, but
I do not get the Semaphore dieing possiblity.
> Regards
> Jeff
> From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> Sent: vendredi 14 novembre 2014 01:29
> To: user@flume.apache.org
> Cc: user@flume.apache.org
> Subject: RE: All channels in an agent get slower after a channel is full
> It is expected that if one channel is full, the whole batch is considered failed, and
the source will retry. If even one required channel is full, the whole transaction fails.
If you don’t want this mark channels are optional.
> Also, all channels have a keep-alive, that is the period (in seconds) that the put fails
with lack of data. You can reduce this via configuration. If you reduce this to to 0, it may
cause major concurrency issues (since semaphores will start dieing etc). Things slowing down
could be because of this as well.
> Thanks,
> Hari
> On Thu, Nov 13, 2014 at 4:22 PM, j.guilmard@accenture.com<mailto:j.guilmard@accenture.com>
<j.guilmard@accenture.com<mailto:j.guilmard@accenture.com>> wrote:
> Hi Hari,
> I’m jumping in this discussion as I’m facing similar behavior on channel full impacts.
> I was trying to optimize an HTTPSink that does not sustain the performance it should
when I faced same issue than described below, but with MemoryChannels:
> 1 source (let’s say Avro), with a Replicating Selector duplicating the events in 2
MemoryChannels.
> When one MemoryChannel is full, the other one is getting down, and even worse, the Source
is getting down as well.
> So I suspected initially my particular Sink to have effect on other threads or on the
JVM. So I removed it, and tried a very simple config:
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> a1.sources.r1.type = avro
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = 0.0.0.0
> a1.sources.r1.port = 1234
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 1000
> a1.sinks.k1.type = avro
> a1.sinks.k1.channel = c1
> a1.sinks.k1.hostname = 127.0.0.1
> a1.sinks.k1.port = 3456
> I put another agent listening on the AVRO events on 3456, and I inject load into the
main one, then I stop the listener agent.
> =>  The channel c1 is off course filling up… but the source is impacted as well,
by the channel.
> The threaddump is explicit:
> "New I/O  worker #15" prio=6 tid=0x000000000d252000 nid=0x2990 waiting on condition [0x0000000010cee000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>                 at sun.misc.Unsafe.park(Native Method)
>                 - parking to wait for  <0x00000007818f9c00> (a java.util.concurrent.Semaphore$NonfairSync)
>                 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>                 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>                 at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>                 at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:588)
>                 at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
>                 at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
>                 at org.apache.flume.channel.ChannelProcessor.processEvent(ChannelProcessor.java:267)
>                 at org.apache.flume.source.AvroSource.append(AvroSource.java:348)
>                 at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>                 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:606)
>                 at org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:88)
>                 at org.apache.avro.ipc.Responder.respond(Responder.java:149)
>                 at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
> The source gets stuck on these commits, until the “keep-alive” timeout expires. I
cannot lower a lot this keep-alive, as the lowest value seems to be 1 second. (Unit is seconds).
> To put it in a nutshell, I don’t know if this behavior is expected, but if one Channel
is filling up (at least a MemoryChannel), as per my understand it will impact any other channel
linked to the same source, and will impact the Source itself.
> Do you see any way to prevent a Source from being impacted by the channel filling up
? In my specific scenario, I would prefer losing some events, or at least keep the other channels
working.
> PS: I’m using Flume 1.5 for these tests.
> Regards
> From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> Sent: jeudi 13 novembre 2014 22:04
> To: user@flume.apache.org<mailto:user@flume.apache.org>
> Cc: user@flume.apache.org<mailto:user@flume.apache.org>
> Subject: Re: All channels in an agent get slower after a channel is full
> Yeah, when you are sharing disks — that would cause one channel’s behavior affect
others since your disk is your bottleneck.
> Thanks,
> Hari
> On Thu, Nov 13, 2014 at 1:02 PM, Vincentius Martin <vincentiusmartin@gmail.com<mailto:vincentiusmartin@gmail.com>>
wrote:
> Right now, I am using FileChannel.
> Thanks
> Regards,
> Vincentius Martin
> On Fri, Nov 14, 2014 at 4:00 AM, Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
wrote:
> Are you using MemoryChannel or File Channel?
> Thanks,
> Hari
> On Thu, Nov 13, 2014 at 12:59 PM, Vincentius Martin <vincentiusmartin@gmail.com<mailto:vincentiusmartin@gmail.com>>
wrote:
> Yes, they are sharing the same disk
> I used to try it with memory channel, it also produced the same impact when a channel
in an agent with many channels reaches its channel capacity. It caused ChannelException and
made other channels slower.
> Regards,
> Vincentius Martin
> On Fri, Nov 14, 2014 at 3:47 AM, Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
wrote:
> Are all the channels sharing the same disk(s)?
> Thanks,
> Hari
> On Thu, Nov 13, 2014 at 12:44 PM, Vincentius Martin <vincentiusmartin@gmail.com<mailto:vincentiusmartin@gmail.com>>
wrote:
> it is between agents, I am using avro sinks and file channels while all of those channels
write the checkpoint to a disk.
> For the rest, I am using default configuration.
> Regards,
> Vincentius Martin
> On Fri, Nov 14, 2014 at 1:39 AM, Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
wrote:
> What does your configuration look like? What sink are you using?
> On Thu, Nov 13, 2014 at 8:23 AM, Vincentius Martin <vincentiusmartin@gmail.com<mailto:vincentiusmartin@gmail.com>>
wrote:
> Hi,
> In my cluster, I have an agent with one source connected to multiple channels. Each channel
connected to different sink (1 channel paired with 1 sink) which send events to different
agents (like one to many relation). Just like the multiplexing flow example in Flume user
guide website.
> However, when a channel reaches its capacity (already full)  I see that the agent performance
gets slower.
> What I mean by getting slower is that, all other channel-sink pairs in that agent also
get slower when sending events to their destination. I can understand if the overfilled channel-sink
pair get slower, but why it affects another channel-sink pairs in that agent? From what I
see here, the other pairs should be independent with the overfilled channel except that they
use the same source, right?
> Thanks!
> Regards,
> Vincentius Martin
> ________________________________
> This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including
e-mail and instant messaging (including content), may be scanned by our systems for the purposes
of information security and assessment of internal compliance with Accenture policy.
> ______________________________________________________________________________________
> www.accenture.com<http://www.accenture.com>
Mime
View raw message