flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: File channel performance on a single disk is poor
Date Mon, 09 Jul 2012 06:14:28 GMT
Hi, thanks for your input.

On 07/09/2012 02:42 PM, Arvind Prabhakar wrote:
> Hi,
>
> > It's certainly one possible solution to the issue, though I do
> > believe that the current one could be made more friendly
> > towards single disk access(e.g. batching writes to the disk
> > may well be doable and would be curious what someone
> > with more familiarity with the implementation thinks).
>
> The implementation of the file channel is that of a write ahead log, 
> in that it serializes all the actions as they happen. Using these 
> actions, it can reconstruct the state of the channel at anytime. There 
> are two mutually exclusive transaction types it supports - a 
> transaction consisting of puts, and one consisting of takes. It may be 
> possible to use the heap to batch the puts and takes and serialize 
> them to disk when the commit occurs.
>
> This approach will minimize the number of disk operations and will 
> have an impact on the performance characteristics of the channel. 
> Although it probably will improve performance, it is hard to tell for 
> sure unless we test it out under load in different scenarios.
>

This does sound a lot better to me. I'm not sure if there is much demand 
for restoring the state of an uncommitted set of puts/takes to a file 
channel after restarting an agent? If the transaction wasn't completed  
its current state  is not really going to be important after a restart. 
I'm really not familiar with WAL implementations, but is it not merely 
enough to write the data to be committed before the commit 
marker/informing of success? I don't think it is necessary to write each 
piece as it comes in, so long as it is done before informing of 
success/failure.

Another matter that I'm curious of is whether or not we actually need 
separate files for the data and checkpoints... Can we not add a magic 
header before each type of entry to differentiate, and thus guarantee 
significantly more sequential access? What is killing performance on a 
single disk right now is the constant seeks. The problem with this 
though would be putting together a file format that allows quick seeking 
through to the correct position, and rolling would be a lot harder. I 
think this is a lot more difficult and might be more of a long term target.

Juhani

> Regards,
> Arvind Prabhakar
>
>
> On Wed, Jul 4, 2012 at 3:33 AM, Juhani Connolly 
> <juhani_connolly@cyberagent.co.jp 
> <mailto:juhani_connolly@cyberagent.co.jp>> wrote:
>
>     It looks good to me as it provides a nice balance between
>     reliability and throughput.
>
>     It's certainly one possible solution to the issue, though I do
>     believe that the current one could be made more friendly towards
>     single disk access(e.g. batching writes to the disk may well be
>     doable and would be curious what someone with more familiarity
>     with the implementation thinks).
>
>
>     On 07/04/2012 06:36 PM, Jarek Jarcec Cecho wrote:
>
>         We had connected discussion about this "SpillableChannel"
>         (working name) on FLUME-1045 and I believe that consensus is
>         that we will create something like that. In fact, I'm planning
>         to do it myself in near future - I just need to prioritize my
>         todo list first.
>
>         Jarcec
>
>         On Wed, Jul 04, 2012 at 06:13:43PM +0900, Juhani Connolly wrote:
>
>             Yes... I was actually poking around for that issue as I
>             remembered
>             seeing it before.  I had before also suggested a compound
>             channel
>             that would have worked like the buffer store in scribe,
>             but general
>             opinion was that it provided too many mixed configurations
>             that
>             could make testings and verifying correctness difficult.
>
>             On 07/04/2012 04:33 PM, Jarek Jarcec Cecho wrote:
>
>                 Hi Juhally,
>                 while ago I've filled jira FLUME-1227 where I've
>                 suggested creating some sort of SpillableChannel that
>                 would behave similarly as scribe. It would be normally
>                 acting as memory channel and it would start spilling
>                 data to disk in case that it would get full (my
>                 primary goal here was to solve issue when remote goes
>                 down, for example in case of HDFS maintenance). Would
>                 it be helpful for your case?
>
>                 Jarcec
>
>                 On Wed, Jul 04, 2012 at 04:07:48PM +0900, Juhani
>                 Connolly wrote:
>
>                     Evaluating flume on some of our servers, the file
>                     channel seems very
>                     slow, likely because like most typical web servers
>                     ours have a
>                     single raided disk available for writing to.
>
>                     Quoted below is a suggestion from a  previous
>                     issue where our poor
>                     throughput came up, where it turns out that on
>                     multiple disks, file
>                     channel performance is great.
>
>                     On 06/27/2012 11:01 AM, Mike Percy wrote:
>
>                         We are able to push > 8000 events/sec (2KB per
>                         event) through a single file channel if you
>                         put checkpoint on one disk and use 2 other
>                         disks for data dirs. Not sure what the limit
>                         is. This is using the latest trunk code. Other
>                         limitations may be you need to add additional
>                         sinks to your channel to drain it faster. This
>                         is because sinks are single threaded and
>                         sources are multithreaded.
>
>                         Mike
>
>                     For the case where the disks happen to be
>                     available on the server,
>                     that's fantastic, but I suspect that most use
>                     cases are going to be
>                     similar to ours, where multiple disks are not
>                     available. Our use
>                     case isn't unusual as it's primarily aggregating
>                     logs from various
>                     services.
>
>                     We originally ran our log servers with a
>                     exec(tail)->file->avro
>                     setup where throughput was very bad(80mb in an
>                     hour). We then
>                     switched this to a memory channel which was
>                     fine(the peak time 500mb
>                     worth of hourly logs went through). Afterwards we
>                     switched back to
>                     the file channel, but with 5 identical avro sinks.
>                     This did not
>                     improve throughput(still 80mb).
>                     RecoverableMemoryChannel showed very
>                     similar characteristics.
>
>                     I presume this is due to the writes going to two
>                     separate places,
>                     and being further compounded by also writing out
>                     and tailing the
>                     normal web logs: checking top and iostat, we could
>                     confirm we have
>                     significant iowait time, far more than we have
>                     during typical
>                     operation.
>
>                     As it is, we seem to be more or less guaranteeing
>                     no loss of logs
>                     with the file channel. Perhaps we could look into
>                     batching
>                     puts/takes for those that do not need 100% data
>                     retention but want
>                     more reliability than with the MemoryChannel which
>                     can potentially
>                     lose the entire capacity on a restart? Another
>                     possibility is
>                     writing an implementation that writes primarily
>                     sequentially. I've
>                     been meaning to get a deeper look at the
>                     implementation itself to
>                     give a more informed commentary on the contents
>                     but unfortunately
>                     don't have the cycles right now, hopefully someone
>                     with a better
>                     understanding of the current implementation(along
>                     with its
>                     interaction with the OS file cache) can comment on
>                     this.
>
>
>
>
>



Mime
View raw message