flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamal Bahadur <mailtoka...@gmail.com>
Subject Re: flume dying on InterruptException (nanos)
Date Thu, 20 Oct 2011 14:48:40 GMT
I agree with Prasad's solution. Since we are going to use different backends
(I use Cassandra) to store data, we cannot have some fixed time there.

Thanks,
Kamal

On Wed, Oct 19, 2011 at 6:08 PM, Prasad Mujumdar <prasadm@cloudera.com>wrote:

>
>   hmm ... I am wondering if the Trigger thread should just bail out without
> resetting trigger if it can't get hold of the lock in 1 sec. The next append
> or next trigger should take care of rotating the files ..
>
> thanks
> Prasad
>
>
> On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cgandevia@gmail.com>wrote:
>
>> We recently modified the RollSink to hide our problem by giving it a few
>> seconds to finish writing before rolling. We are going to test it out and if
>> it fixes our issue we will provide a patch later today.
>>  On Oct 19, 2011 1:27 PM, "AD" <straightflush@gmail.com> wrote:
>>
>>> Yea, i am using Hbase sink, so i guess its possible something is getting
>>> hung up there and causing the collector to die. The number of file
>>> descriptors seems more than safe under the limit.
>>>
>>> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cgandevia@gmail.com>wrote:
>>>
>>>> We were seeing the same issue when our HDFS instance was overloaded and
>>>> taking over a second to respond. I assume if whatever backend is down the
>>>> collector will die and need to be restarted when it becomes available again?
>>>> Doesn't seem very reliable
>>>>
>>>>
>>>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <
>>>> ralph.goers@dslextreme.com> wrote:
>>>>
>>>>> We saw this problem when it was taking more than 1 second for a
>>>>> response from writing to Cassandra (our back end).  A single long response
>>>>> will kill the collector.  We had to revert back to the version of Flume
that
>>>>> uses syncrhonization instead of read/write locking to get around this.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>>>>
>>>>> > Hello,
>>>>> >
>>>>> >  My collector keeps dying with the following error, is this a known
>>>>> issue? Any idea how to prevent or find out what is causing it ?  is
>>>>> format("%{nanos}" an issue ?
>>>>> >
>>>>> > 2011-10-17 23:16:33,957 INFO
>>>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>>>>> flume1-18 exited with error: null
>>>>> > java.lang.InterruptedException
>>>>> >       at
>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>>>> >       at
>>>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>>>>> >       at
>>>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>>>> >       at
>>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>>> >       at
>>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>>> >
>>>>> >
>>>>> > source:  collectorSource("35853")
>>>>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>>>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>>>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>>>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>>>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000)
{
>>>>> attr2hbase("apache_logs","f1","","hbase_") }
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>>
>>>> Cameron Gandevia
>>>>
>>>
>>>
>

Mime
View raw message