flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: AgentDFOSink loses one message
Date Fri, 23 Dec 2011 12:01:08 GMT
Hi,

as I see you found the reason, it was looking like (see
flume-agent-reconfigure). Should be fixed in flumeNG
(https://cwiki.apache.org/FLUME/flume-ng.html).
Question forward to Arvind :)

- Alex

On Fri, Dec 23, 2011 at 12:35 PM, Tobias Schulze-Heinrichs
<t.schulze@xplosion.de> wrote:
> Hello all,
>
> I guess I found the reason. It is already described here: https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/93194ec410e41fe4.
>
> Summarizing you can say that the ThriftEventSink doesn't get that the opposite Listener
is down. You'll write a serialized event to a SocketStream, call flush and than you'll _not_
get an exception.
> So this seems to be the default behavior.
>
> The question is now for me - is that something that should be changed? Can I configure
Thrift in a way that it always checks the connection state before sending an event?
>
> Greetings
> Tobi
>
>
> Am 23.12.2011 um 11:02 schrieb Tobias Schulze-Heinrichs:
>
>> Here is a log extract:
>>
>> *** Example 1 - Sending one message ***
>>
>> ** log of node1:
>>
>> 2011-12-23 10:36:00,065 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed
/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103550020+0100.1324632950020572000.00000034
>> 2011-12-23 10:36:00,066 [Roll-TriggerThread-0] INFO diskfailover.NaiveFileFailoverManager:
File lives in /tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103550020+0100.1324632950020572000.00000034
>> 2011-12-23 10:36:00,066 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: constructed
new seqfile event sink: file=/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103600066+0100.1324632960066504000.00000034
>> 2011-12-23 10:36:00,069 [FileFailover-36] INFO diskfailover.NaiveFileFailoverManager:
opening new file for 20111223-103550020+0100.1324632950020572000.00000034
>> 2011-12-23 10:36:00,071 [FileFailover-36] INFO diskfailover.DiskFailoverSource: end
of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@75fc25e5
>> this is my message <------ here goes my message
>> 2011-12-23 10:36:04,427 [logicalNode node1-29] INFO thrift.ThriftEventSink: ThriftEventSink
on port 8999 closed
>> 2011-12-23 10:36:04,427 [logicalNode node1-29] INFO thrift.ThriftEventSink: ThriftEventSink
on port 8999 closed
>> 2011-12-23 10:36:10,108 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed
/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103600066+0100.1324632960066504000.00000034
>> 2011-12-23 10:36:10,108 [Roll-TriggerThread-0] INFO diskfailover.NaiveFileFailoverManager:
File lives in /tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103600066+0100.1324632960066504000.00000034
>> 2011-12-23 10:36:10,109 [FileFailover-36] INFO diskfailover.NaiveFileFailoverManager:
opening new file for 20111223-103600066+0100.1324632960066504000.00000034
>> 2011-12-23 10:36:10,110 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: constructed
new seqfile event sink: file=/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103610109+0100.1324632970109909000.00000034
>> 2011-12-23 10:36:10,112 [FileFailover-36] INFO diskfailover.DiskFailoverSource: end
of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@7cd2b918
>> 2011-12-23 10:36:20,149 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed
/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103610109+0100.1324632970109909000.00000034
>> 2011-12-23 10:36:20,149 [Roll-TriggerThread-0] INFO diskfailover.NaiveFileFailoverManager:
File lives in /tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103610109+0100.1324632970109909000.00000034
>> 2011-12-23 10:36:20,151 [FileFailover-36] INFO diskfailover.NaiveFileFailoverManager:
opening new file for 20111223-103610109+0100.1324632970109909000.00000034
>> 2011-12-23 10:36:20,151 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: constructed
new seqfile event sink: file=/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103620151+0100.1324632980151135000.00000034
>> 2011-12-23 10:36:20,153 [FileFailover-36] INFO diskfailover.DiskFailoverSource: end
of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@6e7616ad
>>
>> **** Node2 was down and is now restarted. Here is what you can see there: ****
>>
>> 2011-12-23 10:36:39,460 [main] INFO conf.SinkFactoryImpl: Found sink decorator profilingDecorator
in de.xplosion.flumeplugins.ProfilingDecorator
>> 2011-12-23 10:36:39,491 [main] INFO agent.LogicalNode: Node config successfully set
to FlumeConfigData: {srcVer:'Thu Jan 01 01:00:00 CET 1970' snkVer:'Thu Jan 01 01:00:00 CET
1970'  ts='Thu Jan 01 01:00:00 CET 1970' flowId:'null' source:'null' sink:'null' }
>> 2011-12-23 10:36:39,607 [main] INFO agent.FlumeNode: Hadoop Security enabled: false
>> 2011-12-23 10:36:43,990 [logicalNode node2-23] INFO collector.CollectorSource: opened
>> 2011-12-23 10:36:43,990 [Check config] INFO agent.LogicalNode: Node config successfully
set to FlumeConfigData: {srcVer:'Fri Dec 23 10:23:16 CET 2011' snkVer:'Fri Dec 23 10:23:16
CET 2011'  ts='Fri Dec 23 10:23:16 CET 2011' flowId:'default-flow' source:'collectorSource(
8999 )' sink:'console' }
>> 2011-12-23 10:36:43,999 [logicalNode node2-23] INFO thrift.ThriftEventSource: Starting
blocking thread pool server on port 8999...
>> 2011-12-23 10:36:44,001 [logicalNode node2-23] INFO debug.ConsoleEventSink: ConsoleEventSink(
debug ) opened
>>
>> As you can see no message arrived there. When I look at the agents working dir I
can see only one file at the dfo_writing directory. All other directories are empty:
>>
>> /tmp ls -R /tmp/flume-t.schulze/agent/node1/
>> dfo_error     dfo_import      dfo_logged      dfo_sending     dfo_writing
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_error:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_import:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_logged:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_sending:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_writing:
>> 20111223-105235165+0100.1324633955165498000.00000028
>>
>> #####################################################################################################
>>
>> Now let's run another example with multiple messages:
>>
>>
>> *** Node1 take the message from the console
>>
>> 2011-12-23 10:40:01,346 [Roll-TriggerThread-0] INFO diskfailover.NaiveFileFailoverManager:
File lives in /tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-103951303+0100.1324633191303132000.00000034
>> 2011-12-23 10:40:01,347 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: constructed
new seqfile event sink: file=/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104001347+0100.1324633201347174000.00000034
>> 2011-12-23 10:40:01,348 [FileFailover-36] INFO diskfailover.NaiveFileFailoverManager:
opening new file for 20111223-103951303+0100.1324633191303132000.00000034
>> 2011-12-23 10:40:01,350 [FileFailover-36] INFO diskfailover.DiskFailoverSource: end
of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@72ebf3f0
>> m1 <--- First message
>> m2 <--- Second message
>> 2011-12-23 10:40:04,208 [logicalNode node1-29] INFO core.BackOffFailOverSink: Append
failed java.net.SocketException: Broken pipe
>> m3 <--- Third message
>> 2011-12-23 10:40:05,587 [logicalNode node1-29] INFO thrift.ThriftEventSink: ThriftEventSink
on port 8999 closed
>> 2011-12-23 10:40:05,589 [logicalNode node1-29] INFO thrift.ThriftEventSink: ThriftEventSink
on port 8999 closed
>> m4 <--- Fourth message
>> 2011-12-23 10:40:11,385 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed
/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104001347+0100.1324633201347174000.00000034
>> 2011-12-23 10:40:11,385 [Roll-TriggerThread-0] INFO diskfailover.NaiveFileFailoverManager:
File lives in /tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104001347+0100.1324633201347174000.00000034
>> 2011-12-23 10:40:11,385 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: constructed
new seqfile event sink: file=/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104011385+0100.1324633211385600000.00000034
>> 2011-12-23 10:40:11,386 [FileFailover-36] INFO diskfailover.NaiveFileFailoverManager:
opening new file for 20111223-104001347+0100.1324633201347174000.00000034
>> 2011-12-23 10:40:11,391 [FileFailover-36] INFO debug.StubbornAppendSink: append failed
on event '10.0.0.91 [INFO Fri Dec 23 10:40:05 CET 2011] m3' with error: Append failed java.net.SocketException:
Broken pipe
>> 2011-12-23 10:40:11,391 [FileFailover-36] INFO thrift.ThriftEventSink: ThriftEventSink
on port 8999 closed
>> 2011-12-23 10:40:11,392 [FileFailover-36] INFO debug.InsistentOpenDecorator: open
attempt 0 failed, backoff (1000ms): Failed to open thrift event sink to localhost:8999 : java.net.ConnectException:
Connection refused
>> 2011-12-23 10:40:12,401 [FileFailover-36] INFO debug.InsistentOpenDecorator: open
attempt 1 failed, backoff (2000ms): Failed to open thrift event sink to localhost:8999 : java.net.ConnectException:
Connection refused
>> 2011-12-23 10:40:14,402 [FileFailover-36] INFO debug.InsistentOpenDecorator: open
attempt 2 failed, backoff (4000ms): Failed to open thrift event sink to localhost:8999 : java.net.ConnectException:
Connection refused
>> 2011-12-23 10:40:18,405 [FileFailover-36] INFO thrift.ThriftEventSink: ThriftEventSink
to localhost:8999 opened
>> 2011-12-23 10:40:18,405 [FileFailover-36] INFO debug.InsistentOpenDecorator: Opened
ThriftEventSink on try 3
>> 2011-12-23 10:40:18,406 [FileFailover-36] INFO diskfailover.DiskFailoverSource: end
of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@46dab859
>> 2011-12-23 10:40:21,492 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed
/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104011385+0100.1324633211385600000.00000034
>> 2011-12-23 10:40:21,492 [Roll-TriggerThread-0] INFO diskfailover.NaiveFileFailoverManager:
File lives in /tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104011385+0100.1324633211385600000.00000034
>> 2011-12-23 10:40:21,492 [FileFailover-36] INFO diskfailover.NaiveFileFailoverManager:
opening new file for 20111223-104011385+0100.1324633211385600000.00000034
>> 2011-12-23 10:40:21,493 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: constructed
new seqfile event sink: file=/tmp/flume-t.schulze/agent/node1/dfo_writing/20111223-104021493+0100.1324633221493240000.00000034
>> 2011-12-23 10:40:21,495 [FileFailover-36] INFO diskfailover.DiskFailoverSource: end
of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@417cff66
>>
>> *** Now let's see what happens on Node2 which should receive the messages. Please
note, while all 4 messages where sent, Node2 was down and is now restarted:
>>
>>
>> 2011-12-23 10:40:12,630 [main] INFO conf.SinkFactoryImpl: Found sink decorator profilingDecorator
in de.xplosion.flumeplugins.ProfilingDecorator
>> 2011-12-23 10:40:12,662 [main] INFO agent.LogicalNode: Node config successfully set
to FlumeConfigData: {srcVer:'Thu Jan 01 01:00:00 CET 1970' snkVer:'Thu Jan 01 01:00:00 CET
1970'  ts='Thu Jan 01 01:00:00 CET 1970' flowId:'null' source:'null' sink:'null' }
>> 2011-12-23 10:40:12,774 [main] INFO agent.FlumeNode: Hadoop Security enabled: false
>> 2011-12-23 10:40:17,371 [Check config] INFO agent.LogicalNode: Node config successfully
set to FlumeConfigData: {srcVer:'Fri Dec 23 10:23:16 CET 2011' snkVer:'Fri Dec 23 10:23:16
CET 2011'  ts='Fri Dec 23 10:23:16 CET 2011' flowId:'default-flow' source:'collectorSource(
8999 )' sink:'console' }
>> 2011-12-23 10:40:17,374 [logicalNode node2-23] INFO collector.CollectorSource: opened
>> 2011-12-23 10:40:17,383 [logicalNode node2-23] INFO thrift.ThriftEventSource: Starting
blocking thread pool server on port 8999...
>> 2011-12-23 10:40:17,386 [logicalNode node2-23] INFO debug.ConsoleEventSink: ConsoleEventSink(
debug ) opened
>> 10.0.0.91 [INFO Fri Dec 23 10:40:05 CET 2011] m3
>> 10.0.0.91 [INFO Fri Dec 23 10:40:07 CET 2011] m4
>>
>> You can see that only 2 out of 4 events arrived (m3 and m4).
>>
>> If I look at the agents working dir (while node2 was down) I can see:
>>
>> /tmp ls -R /tmp/flume-t.schulze/agent/node1/
>> dfo_error     dfo_import      dfo_logged      dfo_sending     dfo_writing
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_error:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_import:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_logged:
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_sending:
>> 20111223-105335417+0100.1324634015417110000.00000028
>>
>> /tmp/flume-t.schulze/agent/node1//dfo_writing:
>> 20111223-105345460+0100.1324634025460683000.00000028
>>
>>
>>
>> I hope this infos help you to reproduce or understand the problem.
>>
>> Best regards
>> Tobi
>>
>>
>>
>>
>>
>> Am 22.12.2011 um 16:05 schrieb alo alt:
>>
>>> Tobi,
>>>
>>> could you please att a log extract? And check the directorys of flume:
>>> ls -la /tmp/flume/agent/AGENTNAME/*
>>>
>>> for entrys, especially in sending / sent.
>>>
>>> - Alex
>>>
>>> On Thu, Dec 22, 2011 at 3:41 PM, Tobias Schulze-Heinrichs
>>> <t.schulze@xplosion.de> wrote:
>>>> It seems like there is the possibility that an event get's lost when using
the AgentDFOSink. Please consider the following (minimized) setup:
>>>>
>>>> node1 : console | agentDFOSink("localhost", 8999);
>>>> node2 : collectorSource( 8999 ) | console;
>>>>
>>>> Both nodes are controlled by one master
>>>>
>>>> * Now I sent a couple of event to node1. All events arrive at the console
of node2.
>>>> * Shutdown node2
>>>> * Sent 3 events to node1
>>>> * Restart node2
>>>>
>>>> ------
>>>> Expected result: 3 events will appear at the new started node2
>>>> Got: 2 events arrive at node2. The first one is missing
>>>>
>>>> Any ideas what might went wrong? If I look at the log output of node2 it
seems like the first event doesn't trigger the insistentAppend decorator. After the 2nd event
I got a couple of retries to append the event … maybe that is a hint.
>>>>
>>>> Thanks & Best regards
>>>> Tobi
>>>
>>>
>>>
>>> --
>>> Alexander Lorenz
>>> http://mapredit.blogspot.com
>>>
>>> P Think of the environment: please don't print this email unless you
>>> really need to.
>>
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Mime
View raw message