hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Cloudera Flume
Date Thu, 17 Mar 2011 03:24:08 GMT
Sorry about that

FYI, About 1GB/day across 4 collectors at the moment

On 3/16/11 6:55 PM, James Seigel wrote:
> I believe sir there should be a flume support group on cloudera. I'm
> guessing most of us here haven't used it and therefore aren't  much
> help.
>
> This is vanilla hadoop land. :)
>
> Cheers and good luck!
> James
>
> On a side note, how much data are you pumping through it?
>
>
> Sent from my mobile. Please excuse the typos.
>
> On 2011-03-16, at 7:53 PM, Mark<static.void.dev@gmail.com>  wrote:
>
>> Sorry if this is not the correct list to post this on, it was the closest I could
find.
>>
>> We are using a taildir('/var/log/foo/') source on all of our agents. If this agent
goes down and data can not be sent to the collector for some time, what happens when this
agent becomes available again? Will the agent tail the whole directory starting from the beginning
of all files thus adding duplicate data to our sink?
>>
>> I've read that I could set the startFromEnd parameter to true. In that case however
if an agent goes down then we would lose any data that gets written to our file until the
agent comes back up. How do people handle this? It seems like you either have to deal with
the fact that you will have duplicate or missing data.
>>
>> Thanks||

Mime
View raw message