flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Ordering of messages in flume-ng
Date Wed, 12 Feb 2014 16:19:05 GMT
Hi,

In the cast of no failures with a single source, single channel and single
sink you will see ordering. However, I believe when there is a failure file
channel will change ordering on rollback.

If strict ordering is required it's advisable to assign sequence numbers
upstream and then re-order the data with either a MR job or Impala query
once they land in MapReduce.

Brock


On Wed, Feb 12, 2014 at 12:02 AM, Christopher Shannon <cshannon108@gmail.com
> wrote:

> Interesting question.
>
> I can't answer it, but I would like to know what strategies others have
> pursued if they have had a need to order their data after it gets to the
> end of the Flume pipeline.
>
> - C.
>
>
> On Tue, Feb 11, 2014 at 11:52 PM, Chris Schneider <
> chris@christopher-schneider.com> wrote:
>
>> I've seen a fair number of resources on the web that describe the loose
>> ordering guarantees that flume offers for messages in the face of
>> degradation or failures.  But I can't tell what applies to flume-og, and
>> flume-ng.  Hopefully somebody can help clear up the situation.
>>
>> In the case of a single agent topology, (source -> FileSystem Channel ->
>> sink), can messages become out of order?  What situations cause that?
>>
>> In a multi agent topology, does that answer change?
>>
>> (Agent 1   Source -> FilesystemChannel -> Avro To Collector)
>> (Agent 2   Source -> FilesystemChannel -> Avro To Collector)
>> (Collector Avro from agents -> FilesystemChannel -> final Sink)
>>
>> And perhaps in an even more complicated setup, with multiple collectors,
>> does that answer change further?
>>
>>
>>
>>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Mime
View raw message