flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Flume startup takes ~ hour
Date Tue, 24 Sep 2013 18:15:11 GMT
That is actually a symptom of the real problem. The real problem is that the remove method
ends up hitting the main checkpoint data structure and causes too many ops on the hash map.
The real fix is in the patch I mentioned which reduce the number of ops tremendously.


Thanks,
Hari


On Tuesday, September 24, 2013 at 6:12 AM, Anat Rozenzon wrote:

> For example this stack trace:
> 
> 
> "lifecycleSupervisor-1-2" prio=10 tid=0x00007f89141d8800 nid=0x5ac8 runnable [0x00007f89501ad000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.Integer.valueOf(Integer.java:642)
>         at org.apache.flume.channel.file.EventQueueBackingStoreFile.get(EventQueueBackingStoreFile.java:310)
>         at org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
>         at org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
>         - locked <0x00000006890f68f0> (a org.apache.flume.channel.file.FlumeEventQueue)
>         at org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
>         at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
>         at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
>         at org.apache.flume.channel.file.Log.replay(Log.java:430)
>         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
>         - locked <0x00000006890ea360> (a org.apache.flume.channel.file.FileChannel)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         - locked <0x00000006890ea360> (a org.apache.flume.channel.file.FileChannel)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> 
> 
> 
> On Tue, Sep 24, 2013 at 4:10 PM, Anat Rozenzon <anat@viber.com (mailto:anat@viber.com)>
wrote:
> > After some deeper dive, it seems that the problem is with HashMap usage in EventQueueBackingStoreFile.
> > 
> > Almost every time I run jstack the JVM is inside EventQueueBackingStoreFile.get()
doing either HashMap.containsKey() or Integer.valueOf().
> > This is because of overwriteMap is defined as regular HashMap<Integer, Long>().
> > 
> > Does your fix solves this issue?
> > 
> > I think maybe using a Long[] will be better. 
> > 
> > 
> > On Tue, Sep 24, 2013 at 2:34 PM, Anat Rozenzon <anat@viber.com (mailto:anat@viber.com)>
wrote:
> > > Thanks Hari, great news, I'll be glad to test it.
> > > 
> > > However, I don't have environment with trunk, any way I can get it packaged
somehow?
> > > 
> > > 
> > > On Mon, Sep 23, 2013 at 8:50 PM, Hari Shreedharan <hshreedharan@cloudera.com
(mailto:hshreedharan@cloudera.com)> wrote:
> > > > How many events does the File Channel get every 30 seconds and how many
get taken out? This is one of the edge cases of the File Channel I have been working on ironing
out. There is a patch on https://issues.apache.org/jira/browse/FLUME-2155 (the FLUME-2155-initial.patch
file). If you have data that takes an hour to start, and don't mind testing out this patch
(this might be buggy, cause data loss, hangs etc - so testing in prod is not recommended),
apply this patch to trunk and test it out, and see if it improves the startup time. 
> > > > 
> > > > 
> > > > Thanks,
> > > > Hari
> > > > 
> > > > 
> > > > On Monday, September 23, 2013 at 9:16 AM, Anat Rozenzon wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > I have a flume instance that is collecting logs from several flume
agents using avro source and file channel.
> > > > > Recently, when I'm restarting the collector it takes about an hour
to start listening on the avro port.
> > > > > PSB a jstack entry, any idea why the startup is slow?
> > > > > 
> > > > > Thanks
> > > > > Anat
> > > > > 
> > > > > "lifecycleSupervisor-1-0" prio=10 tid=0x00007f01505e4800 nid=0x4c78
runnable [0x00007f01441d6000]
> > > > >    java.lang.Thread.State: RUNNABLE
> > > > >         at org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
> > > > >         at org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
> > > > >         - locked <0x0000000689149c30> (a org.apache.flume.channel.file.FlumeEventQueue)
> > > > >         at org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
> > > > >         at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
> > > > >         at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
> > > > >         at org.apache.flume.channel.file.Log.replay(Log.java:430)
> > > > >         at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
> > > > >         - locked <0x0000000689145ca8> (a org.apache.flume.channel.file.FileChannel)
> > > > >         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
> > > > >         - locked <0x0000000689145ca8> (a org.apache.flume.channel.file.FileChannel)
> > > > >         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > >         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> > > > >         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> > > > >         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> > > > >         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> > > > >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > >         at java.lang.Thread.run(Thread.java:724)
> > > > > 
> > > > 
> > > 
> > 
> 


Mime
View raw message