flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SaravanaKumar TR <saran0081...@gmail.com>
Subject Re: Need suggestion on reliable source for log processing
Date Mon, 27 Oct 2014 13:39:32 GMT
Ahmed,

Thanks for your details comments.

Final point, in which cases these logging solution will be considered as a
perfect system without  any tradeoffs,

On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <avila@devlogic.eu> wrote:

> Exactly up to the point.
>
>
>
>
> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <saran0081986@gmail.com>
> wrote:
>
>> That was a good point.
>>
>> So if a solution mention as guarantee data delivery , it specifies that
>>  only in the case when the event flows into the source/producers
>> successfully by application and then from that point the system guarantee
>> the event delivery till other end sink/consumer.
>>
>> It has no control over the proper flow of event reaching the
>> source/producer.(like data loss)
>>
>> So there always be chances of data loss when the system goes down , where
>> certain tradeoff measures to be taken.
>>
>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> Flume, Kafka, or any other system can only be responsible for it's own
>>> actions. Looking from the perspective of the exec source in Flume - it
>>> requests from the bash to give him an output from his stout. It cannot
>>> control what bash will return.
>>> Thus, it's not a file to him, but just a stream of text.
>>>
>>> When spooling directory source is in question, it will resume from the
>>> file it failed with.
>>> That reveals two approaches to event consumption: push and pull.
>>>
>>> When push approach is used then it cannot be aware of what comes next
>>> and what was before it started to listen.
>>>
>>> Even so, some sources/producers, even they use pull approach, doesn't
>>> have to know how to return to the last read event. It's up to
>>> implementation.
>>>
>>> Regards,
>>> Ahmed
>>>
>>>
>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> yes , I agree .
>>>>
>>>> I think no logging solution like source in flume/producer in kafka
>>>>  have  any marking feature like exact point till it consumed from logfile
,
>>>> to recover  incase of its failure to again start reading from the same
>>>> point of the logfile.(before failure)
>>>>
>>>> This is the major point where failures were difficult to ignore.Am I
>>>> right?
>>>>
>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> You can use spillable channel that will store events in memory and
>>>>> once it fills it, it will spill to the disk.
>>>>> Also, you can use file channel, but it's as fast as your disk is and
>>>>> it's suggested to use a separate disk for it due to high IO with it,
>>>>> preferably an SSD.
>>>>>
>>>>> But, that will not solve the issue you might run into - if the flume
>>>>> fails for whatever the reason, you'll never be able to continue from
the
>>>>> exact point where it failed.
>>>>> Yes, File channel preserves the state, so it will continue with
>>>>> whatever he already received, but what about the time while it was down
?
>>>>>
>>>>> If you cannot change anything regarding the application that produces
>>>>> the logs, then such circumstance has to be taken as a trade off.
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Yes I understand the concerns with this use case.
>>>>>>
>>>>>> If so we need to configure failover in this scenario , can we have
it
>>>>>> like channel level ,sink channel.
>>>>>>
>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <avila@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>
>>>>>>> No solution will function reliably for your use case, simply
because
>>>>>>> all of them will have to do some sort of tail-f or streaming
on a file and
>>>>>>> if they can't keep up with it (they mostly don't in high speed
entry
>>>>>>> points), they will drop some entries.
>>>>>>> Please, be kind to yourself and plan for failures - if you need
to
>>>>>>> restart Flume or any other solution then you'll face dropped
entries that
>>>>>>> you'll not be able to re-ingest easily as in most cases you won't
know
>>>>>>> which ones you've dropped.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>
>>>>>>>> So from your comments , I consider that flume doesn't have
any
>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>
>>>>>>>> If flume can't provide it, can you help me with any other
log
>>>>>>>> collector solutions which can I consider here to move real
time data to
>>>>>>>> HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <avila@devlogic.eu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Then, you're out of luck in my opinion, as there is no
way other
>>>>>>>>> than tail -f.
>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>> source/channel to keep up with it. If Cnannel is full
it will back-off to
>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>
>>>>>>>>> There is a possibility to hack up the tail -f into another
file
>>>>>>>>> and then custom-rotate that duplicate file.
>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>
>>>>>>>>> Just a side note - If you're operating Java application
(Tomcat or
>>>>>>>>> similar), then you can create multiple output files via
log4j.properties
>>>>>>>>> configuration without application itself knowing anything
about it.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ahmed,
>>>>>>>>>>
>>>>>>>>>> Here in my case , the application will rename the
existing file
>>>>>>>>>> as <logfile>.yesterdaydate and create a new
file as <logfile> at 00:00 AM.
>>>>>>>>>>
>>>>>>>>>> I can't change the log rotation policy of application
for now.So
>>>>>>>>>> I guess I should rule out the option of using spooling
directory source in
>>>>>>>>>> my case.
>>>>>>>>>>
>>>>>>>>>> Can you suggest me with any other options other than
spooling dir
>>>>>>>>>> source.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <avila@devlogic.eu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> It all depends on how log rotation is done and
how application
>>>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>>>> Most of the applications just reopens the log
file when it
>>>>>>>>>>> receives a kill signal. For example, nginx reopens
the log file when it
>>>>>>>>>>> receives USR1 signal, but it doesn't stop the
process. Some applications
>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>
>>>>>>>>>>> If the application just reopens the log file,
then you can
>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>> In that case logrotate daemon won't satisfy such
case, so you'll
>>>>>>>>>>> have to make a cron job to do it.
>>>>>>>>>>> In such case, you would separate finished logs
location and live
>>>>>>>>>>> log location so the spooling directory source
doesn't freak out about
>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, spooling directory source is a way to
go, as it will
>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ahmed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar
TR <
>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup
explanation here.
>>>>>>>>>>>>
>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>
>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>
>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>
>>>>>>>>>>>> Use case:to move real time data from logfile
to HDFS.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It appears like exec is not a reliable source
, as we may data
>>>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So i tried with other option "spooling directory
source" which
>>>>>>>>>>>> is mentioned as reliable source.But here
I have a single logfile where data
>>>>>>>>>>>> gets appended in , so I dont see option of
moving the file to spool
>>>>>>>>>>>> directory.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can anyone help me with providing any other
reliable source
>>>>>>>>>>>> option in case where logfile gets appended
with data and logfile rotation
>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Saravana
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> ---------
>>>>>>>>>>> This e-mail and any attachment is for authorised
use by the
>>>>>>>>>>> intended recipient(s) only. This email contains
confidential information.
>>>>>>>>>>> It should not be copied, disclosed to, retained
or used by, any party other
>>>>>>>>>>> than the intended recipient. Any unauthorised
distribution, dissemination
>>>>>>>>>>> or copying of this E-mail or its attachments,
and/or any use of any
>>>>>>>>>>> information contained in them, is strictly prohibited
and may be illegal.
>>>>>>>>>>> If you are not an intended recipient then please
promptly delete this
>>>>>>>>>>> e-mail and any attachment and all copies and
inform the sender directly via
>>>>>>>>>>> email. Any emails that you send to us may be
monitored by systems or
>>>>>>>>>>> persons other than the named communicant for
the purposes of ascertaining
>>>>>>>>>>> whether the communication complies with the law
and company policies.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use
by the
>>>>>>>>> intended recipient(s) only. This email contains confidential
information.
>>>>>>>>> It should not be copied, disclosed to, retained or used
by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution,
dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or
any use of any
>>>>>>>>> information contained in them, is strictly prohibited
and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly
delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the
sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored
by systems or
>>>>>>>>> persons other than the named communicant for the purposes
of ascertaining
>>>>>>>>> whether the communication complies with the law and company
policies.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information.
It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other
than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination
or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal.
If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail
and any
>>>>>>> attachment and all copies and inform the sender directly via
email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons
other
>>>>>>> than the named communicant for the purposes of ascertaining whether
the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best regards,
>>>>> Ahmed Vila | Senior software developer
>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>
>>>>> Office : +387 33 942 123
>>>>> Mobile: +387 62 139 348
>>>>>
>>>>> Website: www.devlogic.eu
>>>>> E-mail   : avila@devlogic.eu
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than
the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you
are
>>>>> not an intended recipient then please promptly delete this e-mail and
any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than
the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you
are
>>>>> not an intended recipient then please promptly delete this e-mail and
any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Mime
View raw message