flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pritchard, Charles X. -ND" <Charles.X.Pritchard....@disney.com>
Subject Re: flume HA
Date Fri, 11 Oct 2013 17:46:53 GMT
This is not the recommended architecture, just some suggestions.

You could setup a channel which simply sends the avro event to two other machines. This would
lead to double-entries in HDFS to be cleaned up later with an MR job. And if those two other
machines fail to connect to the HDFS cluster, you're still out of luck. I don't believe slave
replication is really part of the Flume scopeā€¦ at that point you may want to point from
flume to a Kafka instance.

However, even in that case, as noted in the Jepsen blog-ette, there can be some edge cases:
http://aphyr.com/posts/293-call-me-maybe-kafka

The trade off here with guarantees is speed.. You can go all the way to writing into HDFS
before completing the transaction, but you're going to see some slow down in the number of
transactions you can handle, and you're going to lose efficiency.

Massive failure and HA are complex topics, there are always trade-offs. This includes HDFS.
If you want to stall out your incoming data until it's been successfully written to two separate
HDFS regions, you're going to have some latency.




On Oct 11, 2013, at 2:21 AM, Pascal Taddei <pascal.taddei@amadeus.com<mailto:pascal.taddei@amadeus.com>>
wrote:

I would like to know what is the recommended architecture to guarantee that an event given
to flume does arrive to HDFS....even in case of  massive failures, machine crash ... .


Mime
View raw message