flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: HDFS Sink performance
Date Thu, 30 Jul 2015 22:38:21 GMT
I had the opportunity to take some quick measurements with the HBase and Async Hbase sinks.
Have updated the wiki with the numbers.

Contrary to my expectation Hbase sink outperformed the async hbase sink. It seems there may
not be much reason to use the Async version since it lacks kerberos support too.

-roshan


From: Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Friday, July 24, 2015 1:27 PM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: HDFS Sink performance

I am inclined to believe that this is a Spool Dir source issue than a channel issue - which
is considerably better (having a regression in one source is better than having an issue in
the channels which would affect the entire framework)


Thanks,
Hari

On Fri, Jul 24, 2015 at 12:15 PM, Robert B Hamilton <robert.hamilton@gm.com<mailto:robert.hamilton@gm.com>>
wrote:
I WAS saying just that, Roshan, but I was wrong!

I was having an issue with the spooldir source which was spoiling the results for versions
1.5 and 1.6.   When I switch to an exec source there is not so much an issue and the hdfs
sink/file channel performance of 1.5 and 1.6 are not measurably worse than 1.3 for 25K/larger
event size after all.  Well, I did say these were quick measurements...I will update the list
after taking more careful tests...

-----Original Message-----
From: Roshan Naik [mailto:roshan@hortonworks.com<mailto:roshan@hortonworks.com>]
Sent: Thursday, July 23, 2015 2:16 PM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: HDFS Sink performance

Robert: Are u saying that the MemCh perf with Null sink also exhibits the same perf degradation
?

A side note: The Spillable channel has a faster performing memory channel (and spilling to
disk can be disabled) but unfortunately there is an issue with its metrics publishing which
is kind of hard to fix.
-roshan


On 7/23/15 12:00 PM, "Robert B Hamilton" <robert.hamilton@gm.com<mailto:robert.hamilton@gm.com>>
wrote:

>I now believe that Roshan is correct that the channel may be the place
>to look.
>
>With tests using null sinks I had found that the channel was not much
>of a factor with 1.3, but now that I check 1.5 and 1.6 with null sinks,
>they still show the same pattern of performance degradation.  The
>interesting thing is that I find similar performance hits both when
>using file channel AND when using memory channel.  Looking forward to
>Johny's findings.
>
>
>From: Hari Shreedharan [mailto:hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>]
>Sent: Thursday, July 23, 2015 12:33 PM
>To: user@flume.apache.org<mailto:user@flume.apache.org>
>Subject: Re: HDFS Sink performance
>
>This is interesting. I believe Johny is actually looking into this
>performance issue.
>
>
>
>Thanks,
>Hari
>
>On Thu, Jul 23, 2015 at 9:27 AM, lohit <lohit.vijayarenu@gmail.com<mailto:lohit.vijayarenu@gmail.com>>
wrote:
>Majority of messages need not be persisted to disk for us. So, we are
>interested in MemoryChannel.
>There has been gradual performance degradation from 1.3.1 -> 1.4.0 ->
>1.6.0.
>See this graph below, were I have a constant stream of messages (blue
>line). While this is happening I swap different versions of flumes for
>agent.
>Orange line shows messages dropped. (Flat line is when data is streamed
>to HDFS) and I have marked flat lines with different versions.
>
>
>
>2015-07-22 19:48 GMT-07:00 Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>:
>
>My guess is that most of you will probably use File channel in
>production with HDFS sink? In which scenario the common observation
>seems to be that the File channel becomes the primary bottleneck. Going
>by Robert's observations too seems to have dropped also since v1.3.
>
>Robert,  can u confirm how many data dirs  were used for your readings
>with FCh ?
>
>-roshan
>
>
>
>From: lohit <lohit.vijayarenu@gmail.com<mailto:lohit.vijayarenu@gmail.com>>
>Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
>Date: Wednesday, July 22, 2015 3:01 PM
>To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
>
>Subject: Re: HDFS Sink performance
>
>Thanks for sharing these number Robert. Curious, I did the same
>experiment.
>Flume 1.3.1 version has higher throughput than 1.6.0 (I was able to get
>sustained 60MB/s with Flume 1.3.1) No config or setup change, just
>changing flume version shows this difference. We should probably look
>at change set between 1.3.1 and 1.5 to see if there was any obvious
>changes.
>
>2015-07-22 14:00 GMT-07:00 Robert B Hamilton <robert.hamilton@gm.com<mailto:robert.hamilton@gm.com>>:
>Here is a comparison between versions 1.3, 1.5, and 1.6.
>I would estimate that error bars are plus or minus 15%.
>
>All parameters are identical, as between runs all I change is the
>version of flume.
>Lohit¹s numbers are fairly consistent with this, because if we double
>the sinks from my 4 to his 8 and assuming linear scalability we would
>expect to get somewhere close to 30-40MB/s.
>
>It looks like the drop off is more pronounced for the larger event size.
>This is of concern to us because we are looking at this for a high
>volume feed with message sizes up to 80 kB.
>
>------------------------------------------
>HDFSx4 sink, Memory channel
>--------------------------------------
>Payload     V1.3      v1.5     v1.6
>(kB)              MB/s
>----------      -----     -----    -----
>1                    27         17         20
>25                  56         15         15
>
>
>
>From: Hari Shreedharan [mailto:hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>]
>Sent: Wednesday, July 22, 2015 1:27 PM
>To: user@flume.apache.org<mailto:user@flume.apache.org>
>Subject: Re: HDFS Sink performance
>
>That is a bit disconcerting. Are you using the same HDFS setup and same
>config for both tests? Would it be possible for you to take a look at
>Flume 1.6.0? Such drops in performance should be taken care of.
>
>
>
>Thanks,
>Hari
>
>On Wed, Jul 22, 2015 at 11:04 AM, Robert B Hamilton
><robert.hamilton@gm.com<mailto:robert.hamilton@gm.com>> wrote:
>My mailer totally scrambled the numbers, probably by inserting special
>characters.
>Sorry, here are the actual results....
>
>All rates in MB/s
>Payload in KB
>
>Flume 1.3.1
>Payload   rate memchRate Fch
>25                  34                      29
>25                  31                  27.6
>25                  50                  23.3
>25                  46.5                  27.2
>50                  31.3                  23.8
>50                  37.4                  31.3
>50                  32.3                  31.8
>80                  30.5                  25.8
>80                  46.2                  25.2
>80                  39.1                  25.8
>80                  56.5                  25.1
>
>Flume 1.5.
>Payload  rate memchRate Fch
>25                  18.7                  15.6
>50                  18.3                  17.3
>80                  18.4                   15.6
>
>-----Original Message-----
>From: Robert B Hamilton [mailto:robert.hamilton@gm.com<mailto:robert.hamilton@gm.com>]
>Sent: Wednesday, July 22, 2015 11:00 AM
>To: user@flume.apache.org<mailto:user@flume.apache.org>
>Subject: RE: HDFS Sink performance
>
> I only see that kind of throughput for event sizes of 25kB to 50kB or
>larger.
>
>These particular tests are done on flume version 1.3.1.
>But because you asked,  I thought to do a few quick runs on 1.5.0.1 and
>added those results below.  The results are significantly different for
>1.5 and I wonder if this is a cause for concern.
>
>None of this has been peer reviewed so it should be considered as
>tentative.
>
>As to the HDD, here is result of a quick and dirty dd test.
>
>  dd if=/dev/zero of=100M bs=1M count=100 conv=fsync oflag=sync
>   104857600 bytes (105 MB) copied, 0.685646 s, 153 MB/s
>
>
>Source data: each record consists of random ascii strings of constant
>length (25k,50k,or 80k depending on the run).
>Source: spooldir
>Channel: file channel single dataDir, or memory channel.
>Sink: four HDFS, SequenceFile, Text, Batch size=10, rollInterval=20
>seconds.
>
>Batch size was kept small because of memory channel capacity.
>Increasing batch size for file channel did not improve performance so I
>kept it at 10.
>
>Here I have numbers for some runs where the payload is varied from
>25K,50K, and 80K. I include memory channel for comparison.
>
>Multiple runs were peformed for each event size. As you can see the
>throughput can vary from run to run because these particular
>measurements were done on an environment that is not tightly
>controlled.  Think of them as "in situ" measurements :)
>
>Flume 1.3.1 memory channel and file channel
>-------------------------------------------------------
>Payload  Rate memch Rate(filechl)
>(kB)(MB/s)       (MB/s)
>-----------------------------------------------------
>253429
>253127.6
>255023.3
>2546.527.2
>5031.223.8
>5037.431.3
>5032.331.8
>8030.525.8
>8046.225.2
>8039.125.8
>8056.525.1
>
>
>Flume 1.5 File Channel and Memory Channel
>---------------------------------------------------
>Event size  Rate memch Rate filech
>(KB)        (MB/s)  (MB/s)
>---------------------------------------------------
>2518.715.6
>5018.317.3
>8018.415.6
>
>-----Original Message-----
>From: Roshan Naik [mailto:roshan@hortonworks.com<mailto:roshan@hortonworks.com>]
>Sent: Friday, July 17, 2015 6:21 PM
>To: user@flume.apache.org<mailto:user@flume.apache.org>
>Subject: Re: HDFS Sink performance
>
>I Updated the Flume wiki with my measurements. Also added section with
>Hive sink measurements.
>
>https://cwiki.apache.org/confluence/display/FLUME/Performance+Measureme
>nts
>+
>-+round+2
>
>
>@Robert:
>  What sort of a HDD are you using ?
>  What is event size ?
>  Which version of flume ?
>
>-roshan
>
>
>
>
>On 7/17/15 12:51 PM, "Robert B Hamilton" <robert.hamilton@gm.com<mailto:robert.hamilton@gm.com>>
wrote:
>
>>Our testing has shown up to 60MB/s to HDFS if we use up to 8 or 10
>>sinks per agent, and with a file channel with a single dataDir.
>>
>>
>>From: lohit [mailto:lohit.vijayarenu@gmail.com<mailto:lohit.vijayarenu@gmail.com>]
>>Sent: Wednesday, July 15, 2015 11:11 AM
>>To: user@flume.apache.org<mailto:user@flume.apache.org>
>>Subject: HDFS Sink performance
>>
>>Hello,
>>
>>Does anyone have some numbers which they can share around HDFS sink
>>performance. From our testing, for single sink writing to HDFS
>>(CompressedStream) and reading from MemoryChannel can only do about
>>35000 events per second (each event is about 1K) in size. After
>>compression this turns out to be ~10MB/s write stream to HDFS file.
>>Which is pretty low. Our configuration looks like this
>>
>>agent.sinks.hdfsSink.type = hdfs
>>agent.sinks.hdfsSink.channel = memoryChannel
>>agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>>agent.sinks.hdfsSink.hdfs.codeC = lzo
>>agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>>agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>>agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>>agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>>agent.sinks.hdfsSink.hdfs.rollCount = 0
>>agent.sinks.hdfsSink.hdfs.batchSize = 10000
>>agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>>
>>agent.channels.memoryChannel.type = memory
>>
>>agent.channels.memoryChannel.capacity = 3000000
>>agent.channels.memoryChannel.transactionCapacity = 10000
>>
>>--
>>Have a Nice Day!
>>Lohit
>>
>>
>>Nothing in this message is intended to constitute an electronic
>>signature unless a specific statement to the contrary is included in
>>this message.
>>
>>Confidentiality Note: This message is intended only for the person or
>>entity to which it is addressed. It may contain confidential and/or
>>privileged material. Any review, transmission, dissemination or other
>>use, or taking of any action in reliance upon this message by persons
>>or entities other than the intended recipient is prohibited and may be
>>unlawful. If you received this message in error, please contact the
>>sender and delete it from your computer.
>
>
>
>Nothing in this message is intended to constitute an electronic
>signature unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons
>or entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.
>
>
>Nothing in this message is intended to constitute an electronic
>signature unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons
>or entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.
>
>
>
>Nothing in this message is intended to constitute an electronic
>signature unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons
>or entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.
>
>
>
>
>--
>Have a Nice Day!
>Lohit
>
>
>
>
>--
>Have a Nice Day!
>Lohit
>
>
>
>Nothing in this message is intended to constitute an electronic
>signature unless a specific statement to the contrary is included in this message.
>
>Confidentiality Note: This message is intended only for the person or
>entity to which it is addressed. It may contain confidential and/or
>privileged material. Any review, transmission, dissemination or other
>use, or taking of any action in reliance upon this message by persons
>or entities other than the intended recipient is prohibited and may be
>unlawful. If you received this message in error, please contact the
>sender and delete it from your computer.



Nothing in this message is intended to constitute an electronic signature unless a specific
statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to which it is
addressed. It may contain confidential and/or privileged material. Any review, transmission,
dissemination or other use, or taking of any action in reliance upon this message by persons
or entities other than the intended recipient is prohibited and may be unlawful. If you received
this message in error, please contact the sender and delete it from your computer.

Mime
View raw message