flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: File channel performance on a single disk is poor
Date Wed, 11 Jul 2012 02:01:46 GMT
Hi, thanks for clarifying.

On 07/10/2012 06:36 PM, Arvind Prabhakar wrote:
> Hi,
> On Sun, Jul 8, 2012 at 11:14 PM, Juhani Connolly 
> <juhani_connolly@cyberagent.co.jp 
> <mailto:juhani_connolly@cyberagent.co.jp>> wrote:
>     Another matter that I'm curious of is whether or not we actually
>     need separate files for the data and checkpoints...
> The data file and checkpoint files serve different purpose. Checkpoint 
> resides in memory and simulates the channel. The only difference is 
> that it does not store the data in the queue itself, but pointers to 
> data that resides in the log files. As a result the memory footprint 
> of the checkpoint is very small regardless of how big each event 
> payload is. This size only depends upon the capacity of the channel 
> and nothing else.
This is more or less what I expected. Am I correct in believing that 
each commit has to has to seek back and forth to two different files? 
This would make all access on a single disk non-sequential.

>     Can we not add a magic header before each type of entry to
>     differentiate, and thus guarantee significantly more sequential
>     access?
> In the general case access will be sequential. In the best case, the 
> channel will have moved the writes to new log files and continue to do 
> reads from old (rolled) files which reduce seek contention. From what 
> I know, I don't think it will be trivial to affect your suggested 
> change without significantly impacting the entire logic of the channel.

I'm not understanding how it reduces the seek contention if the files 
are all on the same disk? I don't think the reads are that painful,a lot 
of it is hopefully taken care of by the os cache...

Implementation would likely be difficult, yes. I've only had an overview 
look at the code, but haven't tried to do it because of this. As you 
suggest it might be better to have a separate implementation.
>     What is killing performance on a single disk right now is the
>     constant seeks. The problem with this though would be putting
>     together a file format that allows quick seeking through to the
>     correct position, and rolling would be a lot harder. I think this
>     is a lot more difficult and might be more of a long term target.
> Perhaps what you are describing is a different type of persistent 
> channel that is optimized for high latency IO systems. I would 
> encourage you to take your idea one step further and see if that can 
> be drafted as yet another channel that serves this particular use-case.

I'd like to do this, though it seems quite involved. Hopefully I can get 
some time to figure it out later along the road. Jarcecs spillable 
channel should also help on this front.

For the time being, I've resolved the issue for us with a workaround by 
limiting the number of commits(by making ExecSource commit multiple 
entries at a time).

My concern is that FileChannel is represented by a number of people as 
having good performance, when at current time it depends on one of two 
things being the case for that: multiple disks, or batched transactions.

  Juhani Connolly

> Regards,
> Arvind Prabhakar
>     Juhani
>>     Regards,
>>     Arvind Prabhakar
>>     On Wed, Jul 4, 2012 at 3:33 AM, Juhani Connolly
>>     <juhani_connolly@cyberagent.co.jp
>>     <mailto:juhani_connolly@cyberagent.co.jp>> wrote:
>>         It looks good to me as it provides a nice balance between
>>         reliability and throughput.
>>         It's certainly one possible solution to the issue, though I
>>         do believe that the current one could be made more friendly
>>         towards single disk access(e.g. batching writes to the disk
>>         may well be doable and would be curious what someone with
>>         more familiarity with the implementation thinks).
>>         On 07/04/2012 06:36 PM, Jarek Jarcec Cecho wrote:
>>             We had connected discussion about this "SpillableChannel"
>>             (working name) on FLUME-1045 and I believe that consensus
>>             is that we will create something like that. In fact, I'm
>>             planning to do it myself in near future - I just need to
>>             prioritize my todo list first.
>>             Jarcec
>>             On Wed, Jul 04, 2012 at 06:13:43PM +0900, Juhani Connolly
>>             wrote:
>>                 Yes... I was actually poking around for that issue as
>>                 I remembered
>>                 seeing it before.  I had before also suggested a
>>                 compound channel
>>                 that would have worked like the buffer store in
>>                 scribe, but general
>>                 opinion was that it provided too many mixed
>>                 configurations that
>>                 could make testings and verifying correctness difficult.
>>                 On 07/04/2012 04:33 PM, Jarek Jarcec Cecho wrote:
>>                     Hi Juhally,
>>                     while ago I've filled jira FLUME-1227 where I've
>>                     suggested creating some sort of SpillableChannel
>>                     that would behave similarly as scribe. It would
>>                     be normally acting as memory channel and it would
>>                     start spilling data to disk in case that it would
>>                     get full (my primary goal here was to solve issue
>>                     when remote goes down, for example in case of
>>                     HDFS maintenance). Would it be helpful for your case?
>>                     Jarcec
>>                     On Wed, Jul 04, 2012 at 04:07:48PM +0900, Juhani
>>                     Connolly wrote:
>>                         Evaluating flume on some of our servers, the
>>                         file channel seems very
>>                         slow, likely because like most typical web
>>                         servers ours have a
>>                         single raided disk available for writing to.
>>                         Quoted below is a suggestion from a  previous
>>                         issue where our poor
>>                         throughput came up, where it turns out that
>>                         on multiple disks, file
>>                         channel performance is great.
>>                         On 06/27/2012 11:01 AM, Mike Percy wrote:
>>                             We are able to push > 8000 events/sec
>>                             (2KB per event) through a single file
>>                             channel if you put checkpoint on one disk
>>                             and use 2 other disks for data dirs. Not
>>                             sure what the limit is. This is using the
>>                             latest trunk code. Other limitations may
>>                             be you need to add additional sinks to
>>                             your channel to drain it faster. This is
>>                             because sinks are single threaded and
>>                             sources are multithreaded.
>>                             Mike
>>                         For the case where the disks happen to be
>>                         available on the server,
>>                         that's fantastic, but I suspect that most use
>>                         cases are going to be
>>                         similar to ours, where multiple disks are not
>>                         available. Our use
>>                         case isn't unusual as it's primarily
>>                         aggregating logs from various
>>                         services.
>>                         We originally ran our log servers with a
>>                         exec(tail)->file->avro
>>                         setup where throughput was very bad(80mb in
>>                         an hour). We then
>>                         switched this to a memory channel which was
>>                         fine(the peak time 500mb
>>                         worth of hourly logs went through).
>>                         Afterwards we switched back to
>>                         the file channel, but with 5 identical avro
>>                         sinks. This did not
>>                         improve throughput(still 80mb).
>>                         RecoverableMemoryChannel showed very
>>                         similar characteristics.
>>                         I presume this is due to the writes going to
>>                         two separate places,
>>                         and being further compounded by also writing
>>                         out and tailing the
>>                         normal web logs: checking top and iostat, we
>>                         could confirm we have
>>                         significant iowait time, far more than we
>>                         have during typical
>>                         operation.
>>                         As it is, we seem to be more or less
>>                         guaranteeing no loss of logs
>>                         with the file channel. Perhaps we could look
>>                         into batching
>>                         puts/takes for those that do not need 100%
>>                         data retention but want
>>                         more reliability than with the MemoryChannel
>>                         which can potentially
>>                         lose the entire capacity on a restart?
>>                         Another possibility is
>>                         writing an implementation that writes
>>                         primarily sequentially. I've
>>                         been meaning to get a deeper look at the
>>                         implementation itself to
>>                         give a more informed commentary on the
>>                         contents but unfortunately
>>                         don't have the cycles right now, hopefully
>>                         someone with a better
>>                         understanding of the current
>>                         implementation(along with its
>>                         interaction with the OS file cache) can
>>                         comment on this.

View raw message