Mailing-List: contact user-help@flume.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flume.apache.org
Received-SPF: pass (athena.apache.org: domain of jagadish.bihani@pubmatic.com
 designates 64.78.56.62 as permitted sender)
Message-ID: <50853275.9030607@pubmatic.com>
Date: Mon, 22 Oct 2012 17:18:05 +0530
From: Jagadish Bihani <jagadish.bihani@pubmatic.com>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:16.0) Gecko/20121011 Thunderbird/16.0.1
MIME-Version: 1.0
To: <user@flume.apache.org>
CC: Brock Noland <brock@cloudera.com>
Subject: File Channel  performance and fsync
References: <5073D638.2000008@pubmatic.com>
 <CAFukC=6WFUsFQ_4vjDwb9Ke=6RAC_1GZE2f0pEZSns-r6OETwA@mail.gmail.com>
 <507549C2.20006@pubmatic.com>
 <CAFukC=50LZTegCGVRt+ODEJqojzMkfLj0Nf=8ctDhfy573hinw@mail.gmail.com>
 <50759BB0.90504@pubmatic.com>
 <CAFukC=5jCgxq9VewOM+Mmyaoh=ipd85nvw_C5NBSOQM5UmYtgw@mail.gmail.com>
 <5075A0CA.5050707@pubmatic.com>
 <CAFukC=4+jO9yoWJQVqNdxQoDMvhtBj9-obXh7ex-8_9-9tx8Aw@mail.gmail.com>
In-Reply-To: 
 <CAFukC=4+jO9yoWJQVqNdxQoDMvhtBj9-obXh7ex-8_9-9tx8Aw@mail.gmail.com>
Content-Type: multipart/alternative;
	boundary="------------050108060802070408070403"

--------------050108060802070408070403
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Hi

I am writing this on top of another thread where there was discussion on 
"fsync lies" and
only file channel used fsync and not file sink. :

-- I tested the fsync performance on 2 machines  (On 1 machine I was 
getting very good throughput
using file channel and on another almost 100 times slower with almost 
same hardware configuration.)
using following code


#define PAGESIZE 4096

int main(int argc, char *argv[])
{

         char my_write_str[PAGESIZE];
         char my_read_str[PAGESIZE];
         char *read_filename= argv[1];
         int readfd,writefd;

         readfd = open(read_filename,O_RDONLY);
         writefd = open("written_file",O_WRONLY|O_CREAT,777);
         int len=lseek(readfd,0,2);
         lseek(readfd,0,0);
         int iterations = len/PAGESIZE;
         int i;
         struct timeval t0,t1;

        for(i=0;i<iterations;i++)
         {

                 read(readfd,my_read_str,PAGESIZE);
                 write(writefd,my_read_str,PAGESIZE);
*gettimeofday(&t0,0);**
**                fsync(writefd);**
**              gettimeofday(&t1,0);*
                 long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 + 
t1.tv_usec-t0.tv_usec;
                 printf("Elapsed time is= %ld \n",elapsed);
          }
         close(readfd);
         close(writefd);
}


-- As expected it requires typically 50000 microseconds for fsync to 
complete on one machine and 200 microseconds
on another machine it took 290 microseconds to complete on an average. 
So is machine with higher
performance is doing a 'fsync lie'?
i
-- If I have understood it clearly; "fsync lie" means the data is not 
actually written to disk and it is in
some disk/controller buffer.  I) Now if disk loses power due to some 
shutdown or any other disaster, data will
be lost. II) Can data be lost even without it ? (e.g. if it is keeping 
data in some disk buffer and if fsync is being
invoked continuously then will that data can also  be lost? If only part 
-I is true; then it can be acceptable
because probability of shutdown is usually less in production 
environment. But if even II is true then there is a
problem.

-- But on the machine where disk doesn't lie performance of flume using 
File channel is very low (I have seen it
maximum 100 KB/sec even with sufficient  DirectMemory allocation.) Does 
anybody have stats about throughput
of file channel ? Is anybody getting better performance with file 
channel (without fsync lies). What is the recommended
usage of it for an average scenario ? (Transferring files of few MBs to 
HDFS sink continuously on typical hardware
(16 core processors, 16 GB RAM etc.)


Regards,
Jagadish

On 10/10/2012 11:30 PM, Brock Noland wrote:
> Hi,
>
> On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
> <jagadish.bihani@pubmatic.com> wrote:
>> Hi Brock
>>
>> I will surely look into 'fsync lies'.
>>
>> But as per my experiments I think "file channel" is causing the issue.
>> Because on those 2 machines (one with higher throughput and other with
>> lower)
>> I did following experiment:
>>
>> cat Source -memory channel - file sink.
>>
>> Now with this setup I got same throughput on both the machines. (around 3
>> MB/sec)
>> Now as I have used "File sink" it should also do "fsync" at some point of
>> time.
>> 'File Sink' and 'File Channel' both do disk writes.
>> So if there is differences in disk behaviour then even in the 'File Sink' it
>> should be visible.
>>
>> Am I missing something here?
> File sink does not call fsync.
>
>> Regards,
>> Jagadish
>>
>>
>>
>> On 10/10/2012 09:35 PM, Brock Noland wrote:
>>> OK your disk that is giving you 40KB/second is telling you the truth
>>> and the faster disk is lying to you. Look up "fsync lies" to see what
>>> I am referring to.
>>>
>>> A spinning disk can do 100 fsync operations per second (this is done
>>> at the end of every batch). That is how I estimated your event size,
>>> 40KB/second is doing 40KB / 100 =  409 bytes.
>>>
>>> Once again, if you want increased performance, you should increase the
>>> batch size.
>>>
>>> Brock
>>>
>>> On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>> <jagadish.bihani@pubmatic.com> wrote:
>>>> Hi
>>>>
>>>> Yes. It is around 480 - 500 bytes.
>>>>
>>>>
>>>> On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>> How big are your events? Average about 400 bytes?
>>>>>
>>>>> Brock
>>>>>
>>>>> On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>> <jagadish.bihani@pubmatic.com> wrote:
>>>>>> Hi
>>>>>>
>>>>>> Thanks for the inputs Brock. After doing several experiments
>>>>>> eventually problem boiled down to disks.
>>>>>>
>>>>>>     -- But I had used the same configuration (so all software components
>>>>>> are
>>>>>> same in all 3 machines)
>>>>>> on all 3 machines.
>>>>>> -- In User guide it is written that if multiple file channel instances
>>>>>> are
>>>>>> active on the same agent then
>>>>>> different disks are preferable. But in my case only one file channel is
>>>>>> active per agent.
>>>>>> -- Only one pattern I observed that on the machines where I got better
>>>>>> performance have multiple disks.
>>>>>> But I don't understand how that will help if I have only 1 active file
>>>>>> channel.
>>>>>> -- What is the impact of the type of disk/disk device driver on
>>>>>> performance?
>>>>>> I mean I don't understand
>>>>>> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>
>>>>>> Could you please elaborate on File channel and disks correlation.
>>>>>>
>>>>>> Regards,
>>>>>> Jagadish
>>>>>>
>>>>>>
>>>>>> On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Using file channel, in terms of performance, the number and type of
>>>>>> disks is going to be much more predictive of performance than CPU or
>>>>>> RAM. Note that consumer level drives/controllers will give you much
>>>>>> "better" performance because they lie to you about when your data is
>>>>>> actually written to the drive. If you search for "fsync lies" you'll
>>>>>> find more information on this.
>>>>>>
>>>>>> You probably want to increase the batch size to get better performance.
>>>>>>
>>>>>> Brock
>>>>>>
>>>>>> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>> <jagadish.bihani@pubmatic.com> wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> My flume setup is:
>>>>>>
>>>>>> Source Agent : cat source - File Channel - Avro Sink
>>>>>> Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>
>>>>>> There is only 1 source agent and 1 destination agent.
>>>>>>
>>>>>> I measure throughput as amount of data written to HDFS per second.
>>>>>> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>> sec
>>>>>> the
>>>>>> throughput is : -- 2 MB/sec ).
>>>>>>
>>>>>> I have run source agent on various machines with different hardware
>>>>>> configurations :
>>>>>> (In all cases I run flume agent with JAVA OPTIONS as
>>>>>> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>> -XX:MaxDirectMemorySize=2g")
>>>>>>
>>>>>> JDK is 32 bit.
>>>>>>
>>>>>> Experiment 1:
>>>>>> =====
>>>>>> RAM : 16 GB
>>>>>> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>> 64 bit Processor with 64 bit Kernel.
>>>>>> Throughput: 2 MB/sec
>>>>>>
>>>>>> Experiment 2:
>>>>>> ======
>>>>>> RAM : 4 GB
>>>>>> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>> 64 bit Processor with 32 bit Kernel.
>>>>>> Throughput : 30 KB/sec
>>>>>>
>>>>>> Experiment 3:
>>>>>> ======
>>>>>> RAM : 8 GB
>>>>>> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>> 64 bit Processor with 32 bit Kernel.
>>>>>> Throughput : 80 KB/sec
>>>>>>
>>>>>>     -- So as can be seen there is huge difference in the throughput with
>>>>>> same
>>>>>> configuration but
>>>>>> different hardware.
>>>>>> -- In the first case where throughput is more RES is around 160 MB in
>>>>>> other
>>>>>> cases it is in
>>>>>> the range of 40 MB - 50 MB.
>>>>>>
>>>>>> Can anybody please give insights that why there is this huge difference
>>>>>> in
>>>>>> the throughput?
>>>>>> What is the correlation between RAM and filechannel/HDFS sink
>>>>>> performance
>>>>>> and also
>>>>>> with 32-bit/64 bit kernel?
>>>>>>
>>>>>> Regards,
>>>>>> Jagadish
>>>>>>
>>>>>>
>>>>>>
>>>
>
>


--------------050108060802070408070403
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi<br>
      <br>
      I am writing this on top of another thread where there was
      discussion on "fsync lies" and<br>
      only file channel used fsync and not file sink. :<br>
      <br>
      -- I tested the fsync performance on 2 machines&nbsp; (On 1 machine I
      was getting very good throughput<br>
      using file channel and on another almost 100 times slower with
      almost same hardware configuration.)<br>
      using following code<br>
      <br>
      <br>
      #define PAGESIZE 4096<br>
      <br>
      int main(int argc, char *argv[])<br>
      {<br>
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; char my_write_str[PAGESIZE];<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; char my_read_str[PAGESIZE];<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; char *read_filename= argv[1];<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int readfd,writefd;<br>
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; readfd = open(read_filename,O_RDONLY);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; writefd = open("written_file",O_WRONLY|O_CREAT,777);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int len=lseek(readfd,0,2);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; lseek(readfd,0,0);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int iterations = len/PAGESIZE;<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int i;<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; struct timeval t0,t1;<br>
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for(i=0;i&lt;iterations;i++)<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<br>
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read(readfd,my_read_str,PAGESIZE);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; write(writefd,my_read_str,PAGESIZE);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <b>gettimeofday(&amp;t0,0);</b><b><br>
      </b><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fsync(writefd);</b><b><br>
      </b><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gettimeofday(&amp;t1,0);</b><br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
      t1.tv_usec-t0.tv_usec;<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("Elapsed time is= %ld \n",elapsed);<br>
      &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; }<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; close(readfd);<br>
      &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; close(writefd);<br>
      }<br>
      <br>
      <br>
      -- As expected it requires typically 50000 microseconds for fsync
      to complete on one machine and 200 microseconds <br>
      on another machine it took 290 microseconds to complete on an
      average. So is machine with higher<br>
      performance is doing a 'fsync lie'? <br>
      i<br>
      -- If I have understood it clearly; "fsync lie" means the data is
      not actually written to disk and it is in <br>
      some disk/controller buffer.&nbsp; I) Now if disk loses power due to
      some shutdown or any other disaster, data will<br>
      be lost. II) Can data be lost even without it ? (e.g. if it is
      keeping data in some disk buffer and if fsync is being<br>
      invoked continuously then will that data can also&nbsp; be lost? If
      only part -I is true; then it can be acceptable<br>
      because probability of shutdown is usually less in production
      environment. But if even II is true then there is a<br>
      problem.<br>
      <br>
      -- But on the machine where disk doesn't lie performance of flume
      using File channel is very low (I have seen it <br>
      maximum 100 KB/sec even with sufficient&nbsp; DirectMemory allocation.)
      Does anybody have stats about throughput<br>
      of file channel ? Is anybody getting better performance with file
      channel (without fsync lies). What is the recommended<br>
      usage of it for an average scenario ? (Transferring files of few
      MBs to HDFS sink continuously on typical hardware <br>
      (16 core processors, 16 GB RAM etc.)<br>
      <br>
      <br>
      Regards,<br>
      Jagadish<br>
      <br>
      On 10/10/2012 11:30 PM, Brock Noland wrote:<br>
    </div>
    <blockquote
cite="mid:CAFukC=4+jO9yoWJQVqNdxQoDMvhtBj9-obXh7ex-8_9-9tx8Aw@mail.gmail.com"
      type="cite">
      <pre wrap="">Hi,

On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
<a class="moz-txt-link-rfc2396E" href="mailto:jagadish.bihani@pubmatic.com">&lt;jagadish.bihani@pubmatic.com&gt;</a> wrote:
</pre>
      <blockquote type="cite">
        <pre wrap="">Hi Brock

I will surely look into 'fsync lies'.

But as per my experiments I think "file channel" is causing the issue.
Because on those 2 machines (one with higher throughput and other with
lower)
I did following experiment:

cat Source -memory channel - file sink.

Now with this setup I got same throughput on both the machines. (around 3
MB/sec)
Now as I have used "File sink" it should also do "fsync" at some point of
time.
'File Sink' and 'File Channel' both do disk writes.
So if there is differences in disk behaviour then even in the 'File Sink' it
should be visible.

Am I missing something here?
</pre>
      </blockquote>
      <pre wrap="">
File sink does not call fsync.

</pre>
      <blockquote type="cite">
        <pre wrap="">
Regards,
Jagadish


On 10/10/2012 09:35 PM, Brock Noland wrote:
</pre>
        <blockquote type="cite">
          <pre wrap="">
OK your disk that is giving you 40KB/second is telling you the truth
and the faster disk is lying to you. Look up "fsync lies" to see what
I am referring to.

A spinning disk can do 100 fsync operations per second (this is done
at the end of every batch). That is how I estimated your event size,
40KB/second is doing 40KB / 100 =  409 bytes.

Once again, if you want increased performance, you should increase the
batch size.

Brock

On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
<a class="moz-txt-link-rfc2396E" href="mailto:jagadish.bihani@pubmatic.com">&lt;jagadish.bihani@pubmatic.com&gt;</a> wrote:
</pre>
          <blockquote type="cite">
            <pre wrap="">
Hi

Yes. It is around 480 - 500 bytes.


On 10/10/2012 09:24 PM, Brock Noland wrote:
</pre>
            <blockquote type="cite">
              <pre wrap="">
How big are your events? Average about 400 bytes?

Brock

On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
<a class="moz-txt-link-rfc2396E" href="mailto:jagadish.bihani@pubmatic.com">&lt;jagadish.bihani@pubmatic.com&gt;</a> wrote:
</pre>
              <blockquote type="cite">
                <pre wrap="">
Hi

Thanks for the inputs Brock. After doing several experiments
eventually problem boiled down to disks.

   -- But I had used the same configuration (so all software components
are
same in all 3 machines)
on all 3 machines.
-- In User guide it is written that if multiple file channel instances
are
active on the same agent then
different disks are preferable. But in my case only one file channel is
active per agent.
-- Only one pattern I observed that on the machines where I got better
performance have multiple disks.
But I don't understand how that will help if I have only 1 active file
channel.
-- What is the impact of the type of disk/disk device driver on
performance?
I mean I don't understand
with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.

Could you please elaborate on File channel and disks correlation.

Regards,
Jagadish


On 10/09/2012 08:01 PM, Brock Noland wrote:

Hi,

Using file channel, in terms of performance, the number and type of
disks is going to be much more predictive of performance than CPU or
RAM. Note that consumer level drives/controllers will give you much
"better" performance because they lie to you about when your data is
actually written to the drive. If you search for "fsync lies" you'll
find more information on this.

You probably want to increase the batch size to get better performance.

Brock

On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
<a class="moz-txt-link-rfc2396E" href="mailto:jagadish.bihani@pubmatic.com">&lt;jagadish.bihani@pubmatic.com&gt;</a> wrote:

Hi

My flume setup is:

Source Agent : cat source - File Channel - Avro Sink
Dest Agent :     avro source - File Channel - HDFS Sink.

There is only 1 source agent and 1 destination agent.

I measure throughput as amount of data written to HDFS per second.
( I have rolling interval 30 sec; so If 60 MB file is generated in 30
sec
the
throughput is : -- 2 MB/sec ).

I have run source agent on various machines with different hardware
configurations :
(In all cases I run flume agent with JAVA OPTIONS as
"-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
-XX:MaxDirectMemorySize=2g")

JDK is 32 bit.

Experiment 1:
=====
RAM : 16 GB
Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
64 bit Processor with 64 bit Kernel.
Throughput: 2 MB/sec

Experiment 2:
======
RAM : 4 GB
Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
64 bit Processor with 32 bit Kernel.
Throughput : 30 KB/sec

Experiment 3:
======
RAM : 8 GB
Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
64 bit Processor with 32 bit Kernel.
Throughput : 80 KB/sec

   -- So as can be seen there is huge difference in the throughput with
same
configuration but
different hardware.
-- In the first case where throughput is more RES is around 160 MB in
other
cases it is in
the range of 40 MB - 50 MB.

Can anybody please give insights that why there is this huge difference
in
the throughput?
What is the correlation between RAM and filechannel/HDFS sink
performance
and also
with 32-bit/64 bit kernel?

Regards,
Jagadish


</pre>
              </blockquote>
              <pre wrap="">
</pre>
            </blockquote>
          </blockquote>
          <pre wrap="">

</pre>
        </blockquote>
        <pre wrap="">
</pre>
      </blockquote>
      <pre wrap="">


</pre>
    </blockquote>
    <br>
  </body>
</html>

--------------050108060802070408070403--