flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangan Chen <chenshangan...@gmail.com>
Subject Re: file channel read performance impacted by write rate
Date Tue, 17 Dec 2013 15:32:50 GMT
the attachment flume.conf is channel and sink config, dumps.txt is thread
dumps.
channel type "dual" is a channel type I developped to utilize the merits of
memory channel and filechannel. when the volume is not quite big, I use
memory channel, when the size of events reach to a percentage of the memory
channel capacity, it switch to the filechannel, when volume decrease switch
to memory again.

thanks for looking into this.


On Tue, Dec 17, 2013 at 8:54 PM, Brock Noland <brock@cloudera.com> wrote:

> Can you take and share 8-10 thread dumps while the sink is taking events
> "slowly"?
>
> Can you share your machine and file channel configuration?
> On Dec 17, 2013 6:28 AM, "Shangan Chen" <chenshangan521@gmail.com> wrote:
>
>> we face the same problem, performance of taking events from channel is a
>> severe bottleneck. When there're less events in channel,  problem does not
>> alleviate. following is a log of the metrics of writing to hdfs, writing to
>> 5 files with a batchsize of 200000, take cost the most of the total time.
>>
>>
>> 17 十二月 2013 18:49:28,056 INFO
>>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:489)  -
>> HdfsSink-TIME-STAT sink[sink_hdfs_b] writers[5] eventcount[200000]
>> all[44513] take[38197] append[5647] sync[17] getFilenameTime[371]
>>
>>
>>
>>
>>
>> On Mon, Nov 25, 2013 at 4:46 PM, Jan Van Besien <janvb@ngdata.com> wrote:
>>
>>> Hi,
>>>
>>> Is anybody still looking into this question?
>>>
>>> Should I log it in jira such that somebody can look into it later?
>>>
>>> thanks,
>>> Jan
>>>
>>>
>>>
>>> On 11/18/2013 11:28 AM, Jan Van Besien wrote:
>>> > Hi,
>>> >
>>> > Sorry it took me a while to answer this. I compiled a small test case
>>> > using only off the shelve flume components that shows what is going on.
>>> >
>>> > The setup is a single agent with http source, null sink and file
>>> > channel. I am using the default configuration as much as possible.
>>> >
>>> > The test goes as follows:
>>> >
>>> > - start the agent without sink
>>> > - run a script that sends http requests in multiple threads to the http
>>> > source (the script simply calls the url
>>> http://localhost:8080/?key=value
>>> > over and over a gain, whereby value is a random string of 100 chars).
>>> > - this script does about 100 requests per second on my machine. I leave
>>> > it running for a while, such that the file channel contains about 20000
>>> > events.
>>> > - add the null sink to the configuration (around 11:14:33 in the log).
>>> > - observe the logging of the null sink. You'll see in the log file that
>>> > it takes more than 10 seconds per 1000 events (until about even 5000,
>>> > around 11:15:33)
>>> > - stop the http request generating script (i.e. no more writing in file
>>> > channel)
>>> > - observer the logging of the null sink: events 5000 until 20000 are
>>> all
>>> > processed within a few seconds.
>>> >
>>> > In the attachment:
>>> > - flume log
>>> > - thread dumps while the ingest was running and the null sink was
>>> enabled
>>> > - config (agent1.conf)
>>> >
>>> > I also tried with more sinks (4), see agent2.conf. The results are the
>>> same.
>>> >
>>> > Thanks for looking into this,
>>> > Jan
>>> >
>>> >
>>> > On 11/14/2013 05:08 PM, Brock Noland wrote:
>>> >> On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <janvb@ngdata.com
>>> >> <mailto:janvb@ngdata.com>> wrote:
>>> >>
>>> >>      On 11/13/2013 03:04 PM, Brock Noland wrote:
>>> >>       > The file channel uses a WAL which sits on disk.  Each time
an
>>> >>      event is
>>> >>       > committed an fsync is called to ensure that data is durable.
>>> Without
>>> >>       > this fsync there is no durability guarantee. More details
>>> here:
>>> >>       > https://blogs.apache.org/flume/entry/apache_flume_filechannel
>>> >>
>>> >>      Yes indeed. I was just not expecting the performance impact to
be
>>> >>      that big.
>>> >>
>>> >>
>>> >>       > The issue is that when the source is committing one-by-one
>>> it's
>>> >>       > consuming the disk doing an fsync for each event.  I would
>>> find a
>>> >>      way to
>>> >>       > batch up the requests so they are not written one-by-one
or
>>> use
>>> >>      multiple
>>> >>       > disks for the file channel.
>>> >>
>>> >>      I am already using multiple disks for the channel (4).
>>> >>
>>> >>
>>> >> Can you share your configuration?
>>> >>
>>> >>      Batching the
>>> >>      requests is indeed what I am doing to prevent the filechannel to
>>> be the
>>> >>      bottleneck (using a flume agent with a memory channel in front
>>> of the
>>> >>      agent with the file channel), but it inheritely means that I
>>> loose
>>> >>      end-to-end durability because events are buffered in memory
>>> before being
>>> >>      flushed to disk.
>>> >>
>>> >>
>>> >> I would be curious to know though if you doubled the sinks if that
>>> would
>>> >> give more time to readers. Could you take three-four thread dumps of
>>> the
>>> >> JVM while it's in this state and share them?
>>> >>
>>> >
>>>
>>>
>>
>>
>> --
>> have a good day!
>> chenshang'an
>>
>>


-- 
have a good day!
chenshang'an

Mime
View raw message