flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SaravanaKumar TR <saran0081...@gmail.com>
Subject Re: Need suggestion on reliable source for log processing
Date Mon, 27 Oct 2014 12:57:15 GMT
That was a good point.

So if a solution mention as guarantee data delivery , it specifies that
 only in the case when the event flows into the source/producers
successfully by application and then from that point the system guarantee
the event delivery till other end sink/consumer.

It has no control over the proper flow of event reaching the
source/producer.(like data loss)

So there always be chances of data loss when the system goes down , where
certain tradeoff measures to be taken.

On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <avila@devlogic.eu> wrote:

> Hi,
>
> Flume, Kafka, or any other system can only be responsible for it's own
> actions. Looking from the perspective of the exec source in Flume - it
> requests from the bash to give him an output from his stout. It cannot
> control what bash will return.
> Thus, it's not a file to him, but just a stream of text.
>
> When spooling directory source is in question, it will resume from the
> file it failed with.
> That reveals two approaches to event consumption: push and pull.
>
> When push approach is used then it cannot be aware of what comes next and
> what was before it started to listen.
>
> Even so, some sources/producers, even they use pull approach, doesn't have
> to know how to return to the last read event. It's up to implementation.
>
> Regards,
> Ahmed
>
>
> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> yes , I agree .
>>
>> I think no logging solution like source in flume/producer in kafka  have
>>  any marking feature like exact point till it consumed from logfile , to
>> recover  incase of its failure to again start reading from the same point
>> of the logfile.(before failure)
>>
>> This is the major point where failures were difficult to ignore.Am I
>> right?
>>
>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> You can use spillable channel that will store events in memory and once
>>> it fills it, it will spill to the disk.
>>> Also, you can use file channel, but it's as fast as your disk is and
>>> it's suggested to use a separate disk for it due to high IO with it,
>>> preferably an SSD.
>>>
>>> But, that will not solve the issue you might run into - if the flume
>>> fails for whatever the reason, you'll never be able to continue from the
>>> exact point where it failed.
>>> Yes, File channel preserves the state, so it will continue with whatever
>>> he already received, but what about the time while it was down ?
>>>
>>> If you cannot change anything regarding the application that produces
>>> the logs, then such circumstance has to be taken as a trade off.
>>>
>>>
>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Yes I understand the concerns with this use case.
>>>>
>>>> If so we need to configure failover in this scenario , can we have it
>>>> like channel level ,sink channel.
>>>>
>>>> Does flume support to configure failover incase channel fills up.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In fact, this is not the problem with Flume.
>>>>>
>>>>> No solution will function reliably for your use case, simply because
>>>>> all of them will have to do some sort of tail-f or streaming on a file
and
>>>>> if they can't keep up with it (they mostly don't in high speed entry
>>>>> points), they will drop some entries.
>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>> restart Flume or any other solution then you'll face dropped entries
that
>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>> which ones you've dropped.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Thanks for comments Ahmed.
>>>>>>
>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>> reliable source option for use case provided by me.
>>>>>>
>>>>>> If flume can't provide it, can you help me with any other log
>>>>>> collector solutions which can I consider here to move real time data
to
>>>>>> HDFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <avila@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>> than tail -f.
>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>> source/channel to keep up with it. If Cnannel is full it will
back-off to
>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>
>>>>>>> There is a possibility to hack up the tail -f into another file
and
>>>>>>> then custom-rotate that duplicate file.
>>>>>>> But, I wouldn't recommend such case.
>>>>>>>
>>>>>>> Just a side note - If you're operating Java application (Tomcat
or
>>>>>>> similar), then you can create multiple output files via log4j.properties
>>>>>>> configuration without application itself knowing anything about
it.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Ahmed,
>>>>>>>>
>>>>>>>> Here in my case , the application will rename the existing
file as
>>>>>>>> <logfile>.yesterdaydate and create a new file as <logfile>
at 00:00 AM.
>>>>>>>>
>>>>>>>> I can't change the log rotation policy of application for
now.So I
>>>>>>>> guess I should rule out the option of using spooling directory
source in my
>>>>>>>> case.
>>>>>>>>
>>>>>>>> Can you suggest me with any other options other than spooling
dir
>>>>>>>> source.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <avila@devlogic.eu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>> Most of the applications just reopens the log file when
it
>>>>>>>>> receives a kill signal. For example, nginx reopens the
log file when it
>>>>>>>>> receives USR1 signal, but it doesn't stop the process.
Some applications
>>>>>>>>> might restart as a result.
>>>>>>>>>
>>>>>>>>> If the application just reopens the log file, then you
can change
>>>>>>>>> your log rotation policy to be per minute.
>>>>>>>>> In that case logrotate daemon won't satisfy such case,
so you'll
>>>>>>>>> have to make a cron job to do it.
>>>>>>>>> In such case, you would separate finished logs location
and live
>>>>>>>>> log location so the spooling directory source doesn't
freak out about
>>>>>>>>> active log file being appended.
>>>>>>>>>
>>>>>>>>> Anyway, spooling directory source is a way to go, as
it will leave
>>>>>>>>> log files in place, just renamed.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation
here.
>>>>>>>>>>
>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>
>>>>>>>>>> Channel:  file channel
>>>>>>>>>>
>>>>>>>>>> Sink: HDFS
>>>>>>>>>>
>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It appears like exec is not a reliable source , as
we may data
>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So i tried with other option "spooling directory
source" which is
>>>>>>>>>> mentioned as reliable source.But here I have a single
logfile where data
>>>>>>>>>> gets appended in , so I dont see option of moving
the file to spool
>>>>>>>>>> directory.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can anyone help me with providing any other reliable
source
>>>>>>>>>> option in case where logfile gets appended with data
and logfile rotation
>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Saravana
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use
by the
>>>>>>>>> intended recipient(s) only. This email contains confidential
information.
>>>>>>>>> It should not be copied, disclosed to, retained or used
by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution,
dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or
any use of any
>>>>>>>>> information contained in them, is strictly prohibited
and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly
delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the
sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored
by systems or
>>>>>>>>> persons other than the named communicant for the purposes
of ascertaining
>>>>>>>>> whether the communication complies with the law and company
policies.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information.
It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other
than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination
or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal.
If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail
and any
>>>>>>> attachment and all copies and inform the sender directly via
email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons
other
>>>>>>> than the named communicant for the purposes of ascertaining whether
the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than
the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you
are
>>>>> not an intended recipient then please promptly delete this e-mail and
any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Ahmed Vila | Senior software developer
>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>
>>> Office : +387 33 942 123
>>> Mobile: +387 62 139 348
>>>
>>> Website: www.devlogic.eu
>>> E-mail   : avila@devlogic.eu
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Mime
View raw message