flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From no jihun <jees...@gmail.com>
Subject Re: File left as OPEN_FOR_WRITE state.
Date Wed, 20 Jul 2016 08:43:32 GMT
>
> In fact looking at your error the timeout looks like the hdfs.callTimeout,
> so that's where I'd focus. Is your HDFS cluster particularily unperformant?
> 10s to respond to a call is pretty slow.

you are right.

At that time hdfs disks fully utiliized by Map/Reduce jobs.
I expected even flume failed to close files, a while later, disk under
utilized , close retry processed by flume, then close file succefully.

2016-07-20 17:36 GMT+09:00 no jihun <jeesim2@gmail.com>:

> I know about idleTimeout. rollingSize, rollingCount ( which about roll
> over writing file).
>
> I didn't set callTimeout, so the default 10s will be applied.
> also closeTries, retryInterval haven't set too.
>
> So, I think even close failed one time, close retries will be retried
> after 180s(default retryInterval)
> But as you can see at the logs above, close retry never happen.
>
> am I wrong?
>
> 2016-07-20 17:25 GMT+09:00 Chris Horrocks <chris@hor.rocks>:
>
>> You could look at tuning either hdfs.idleTimeout, hdfs.callTimeout, or
>> hdfs.retryInterval which can all be found at:
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>>
>> --
>> Chris Horrocks
>>
>>
>> On Wed, Jul 20, 2016 at 9:01 am, no jihun <'jeesim2@gmail.com'> wrote:
>>
>> @chirs If you meant hdfs.callTimeout
>> Now I am doing a test on that.
>>
>> I can increase the value.
>> When timeout occur while close, It will never retried? ( as logs above )
>>
>> 2016-07-20 16:50 GMT+09:00 Chris Horrocks <chris@hor.rocks>:
>>
>>> Have you tried increasing the HDFS sink timeouts?
>>>
>>> --
>>> Chris Horrocks
>>>
>>>
>>> On Wed, Jul 20, 2016 at 8:03 am, no jihun <'jeesim2@gmail.com'> wrote:
>>>
>>> Hi.
>>>
>>> I found some files on hdfs left as OPEN_FOR_WRITE state.
>>>
>>> *This is flume's log about the file.*
>>>
>>>
>>> 01  18 7 2016 16:12:02,765 INFO
>>>>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.sink.hdfs.BucketWriter.open:234)
>>>
>>> 02 - Creating 1468825922758.avro.tmp
>>>
>>>
>>>> 03  18 7 2016 16:22:39,812 INFO  [hdfs-hdfs2-roll-timer-0]
>>>> (org.apache.flume.sink.hdfs.BucketWriter$5.call:429)
>>>
>>> 04 - Closing idle bucketWriter 1468825922758.avro.tmp at 1468826559812
>>>
>>>
>>>> 05  18 7 2016 16:22:39,812 INFO  [hdfs-hdfs2-roll-timer-0]
>>>> (org.apache.flume.sink.hdfs.BucketWriter.close:363)
>>>
>>> 06 - Closing 1468825922758.avro.tmp
>>>
>>>
>>>> 07  18 7 2016 16:22:49,813 WARN  [hdfs-hdfs2-roll-timer-0]
>>>> (org.apache.flume.sink.hdfs.BucketWriter.close:370)
>>>
>>> 08 - failed to close() HDFSWriter for file (1468825922758.avro.tmp).
>>>> Exception follows.
>>>
>>> 09 java.io.IOException: Callable timed out after 10000 ms on file:
>>>> 1468825922758.avro.tmp
>>>
>>>
>>>> 10  18 7 2016 16:22:49,816 INFO  [hdfs-hdfs2-call-runner-7]
>>>> (org.apache.flume.sink.hdfs.BucketWriter$8.call:629)
>>>
>>> 11 - Renaming 1468825922758.avro.tmp to 1468825922758.avro
>>>
>>>
>>> - seems close never retried
>>> - flume just renamed which still opened.
>>>
>>>
>>> *2 day later I've found that file by this command*
>>>
>>> hdfs fsck /data/flume -openforwrite | grep "OPENFORWRITE" | grep
>>>> "2016/07/18" | sed 's//data/flume// /data/flume//g' | grep -v ".avro.tmp"
|
>>>> sed -n 's/.*(/data/flume/.*avro).*/ /p'
>>>
>>>
>>>
>>> *So, reverseLease-ed*
>>>
>>> hdfs debug recoverLease -path 1468825922758.avro -retries 3
>>>> recoverLease returned false.
>>>> Retrying in 5000 ms...
>>>> Retry #1
>>>> recoverLease SUCCEEDED on 1468825922758.avro
>>>
>>>
>>>
>>> *My hdfs sink configuration*
>>>
>>> hadoop2.sinks.hdfs2.type = hdfs
>>>> hadoop2.sinks.hdfs2.channel = fileCh1
>>>> hadoop2.sinks.hdfs2.hdfs.fileType = DataStream
>>>> hadoop2.sinks.hdfs2.serializer = ....
>>>> hadoop2.sinks.hdfs2.serializer.compressionCodec = snappy
>>>> hadoop2.sinks.hdfs2.hdfs.filePrefix = %{type}_%Y-%m-%d_%{host}
>>>> hadoop2.sinks.hdfs2.hdfs.fileSuffix = .avro
>>>> hadoop2.sinks.hdfs2.hdfs.rollInterval = 3700
>>>> #hadoop2.sinks.hdfs2.hdfs.rollSize = 67000000
>>>> hadoop2.sinks.hdfs2.hdfs.rollSize = 800000000
>>>> hadoop2.sinks.hdfs2.hdfs.rollCount = 0
>>>> hadoop2.sinks.hdfs2.hdfs.batchSize = 10000
>>>> hadoop2.sinks.hdfs2.hdfs.idleTimeout = 300
>>>
>>>
>>> hdfs.closeTries, retryInterval both not set.
>>>
>>>
>>> *My question is  *
>>> why '1468825922758.avro' left OPEN_FOR_WRITE? even though renamed to
>>> .avro succesufully.
>>> Is this expected behavior? so , what should I do to eliminate these
>>> anomal OPENFORWRITE files?
>>>
>>> Regards,
>>> Jihun.
>>>
>>>
>>
>>
>> --
>> ----------------------------------------------
>> Jihun No ( 노지훈 )
>> ----------------------------------------------
>> Twitter          : @nozisim
>> Facebook       : nozisim
>> Website         : http://jeesim2.godohosting.com
>>
>> ---------------------------------------------------------------------------------
>> Market Apps   : android market products.
>> <https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88>
>>
>>
>
>
> --
> ----------------------------------------------
> Jihun No ( 노지훈 )
> ----------------------------------------------
> Twitter          : @nozisim
> Facebook       : nozisim
> Website         : http://jeesim2.godohosting.com
>
> ---------------------------------------------------------------------------------
> Market Apps   : android market products.
> <https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88>
>



-- 
----------------------------------------------
Jihun No ( 노지훈 )
----------------------------------------------
Twitter          : @nozisim
Facebook       : nozisim
Website         : http://jeesim2.godohosting.com
---------------------------------------------------------------------------------
Market Apps   : android market products.
<https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88>

Mime
View raw message