flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SaravanaKumar TR <saran0081...@gmail.com>
Subject Re: Need suggestion on reliable source for log processing
Date Mon, 27 Oct 2014 11:48:08 GMT
yes , I agree .

I think no logging solution like source in flume/producer in kafka  have
 any marking feature like exact point till it consumed from logfile , to
recover  incase of its failure to again start reading from the same point
of the logfile.(before failure)

This is the major point where failures were difficult to ignore.Am I right?

On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <avila@devlogic.eu> wrote:

> Hi,
>
> You can use spillable channel that will store events in memory and once it
> fills it, it will spill to the disk.
> Also, you can use file channel, but it's as fast as your disk is and it's
> suggested to use a separate disk for it due to high IO with it, preferably
> an SSD.
>
> But, that will not solve the issue you might run into - if the flume fails
> for whatever the reason, you'll never be able to continue from the exact
> point where it failed.
> Yes, File channel preserves the state, so it will continue with whatever
> he already received, but what about the time while it was down ?
>
> If you cannot change anything regarding the application that produces the
> logs, then such circumstance has to be taken as a trade off.
>
>
> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> Yes I understand the concerns with this use case.
>>
>> If so we need to configure failover in this scenario , can we have it
>> like channel level ,sink channel.
>>
>> Does flume support to configure failover incase channel fills up.
>>
>>
>>
>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> In fact, this is not the problem with Flume.
>>>
>>> No solution will function reliably for your use case, simply because all
>>> of them will have to do some sort of tail-f or streaming on a file and if
>>> they can't keep up with it (they mostly don't in high speed entry points),
>>> they will drop some entries.
>>> Please, be kind to yourself and plan for failures - if you need to
>>> restart Flume or any other solution then you'll face dropped entries that
>>> you'll not be able to re-ingest easily as in most cases you won't know
>>> which ones you've dropped.
>>>
>>>
>>> Regards,
>>> Ahmed
>>>
>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Thanks for comments Ahmed.
>>>>
>>>> So from your comments , I consider that flume doesn't have any reliable
>>>> source option for use case provided by me.
>>>>
>>>> If flume can't provide it, can you help me with any other log collector
>>>> solutions which can I consider here to move real time data to HDFS.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <avila@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Then, you're out of luck in my opinion, as there is no way other than
>>>>> tail -f.
>>>>> The problem with fail-f is that tail will not wait for source/channel
>>>>> to keep up with it. If Cnannel is full it will back-off to the source
and
>>>>> then the source will just stop ingesting.
>>>>>
>>>>> There is a possibility to hack up the tail -f into another file and
>>>>> then custom-rotate that duplicate file.
>>>>> But, I wouldn't recommend such case.
>>>>>
>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>> similar), then you can create multiple output files via log4j.properties
>>>>> configuration without application itself knowing anything about it.
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Ahmed,
>>>>>>
>>>>>> Here in my case , the application will rename the existing file as
>>>>>> <logfile>.yesterdaydate and create a new file as <logfile>
at 00:00 AM.
>>>>>>
>>>>>> I can't change the log rotation policy of application for now.So
I
>>>>>> guess I should rule out the option of using spooling directory source
in my
>>>>>> case.
>>>>>>
>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>> source.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <avila@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It all depends on how log rotation is done and how application
>>>>>>> producing the log file handles log rotation.
>>>>>>> Most of the applications just reopens the log file when it receives
>>>>>>> a kill signal. For example, nginx reopens the log file when it
receives
>>>>>>> USR1 signal, but it doesn't stop the process. Some applications
might
>>>>>>> restart as a result.
>>>>>>>
>>>>>>> If the application just reopens the log file, then you can change
>>>>>>> your log rotation policy to be per minute.
>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>> have to make a cron job to do it.
>>>>>>> In such case, you would separate finished logs location and live
log
>>>>>>> location so the spooling directory source doesn't freak out about
active
>>>>>>> log file being appended.
>>>>>>>
>>>>>>> Anyway, spooling directory source is a way to go, as it will
leave
>>>>>>> log files in place, just renamed.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>
>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>
>>>>>>>> Channel:  file channel
>>>>>>>>
>>>>>>>> Sink: HDFS
>>>>>>>>
>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>> It appears like exec is not a reliable source , as we may
data loss
>>>>>>>> if channel/source is down.
>>>>>>>>
>>>>>>>>
>>>>>>>> So i tried with other option "spooling directory source"
which is
>>>>>>>> mentioned as reliable source.But here I have a single logfile
where data
>>>>>>>> gets appended in , so I dont see option of moving the file
to spool
>>>>>>>> directory.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can anyone help me with providing any other reliable source
option
>>>>>>>> in case where logfile gets appended with data and logfile
rotation happens
>>>>>>>> only at the end of the day.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Saravana
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information.
It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other
than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination
or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal.
If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail
and any
>>>>>>> attachment and all copies and inform the sender directly via
email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons
other
>>>>>>> than the named communicant for the purposes of ascertaining whether
the
>>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than
the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you
are
>>>>> not an intended recipient then please promptly delete this e-mail and
any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : avila@devlogic.eu
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Mime
View raw message