flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: 答复: 答复: 答复: HDFS SINK Performacne
Date Tue, 28 Aug 2012 11:47:13 GMT
Do you have a batch size configured for HDFSSink?

On Tue, Aug 28, 2012 at 12:42 AM, Shara Shi <shiruihong@dhgate.com> wrote:

> HI Patrick
>
> I try to send a data file over than 200MB via flume avro-client to a flume
> agent with HDFS sink.
> I think most of events are in Channel(memory) , but flush to hdsf(disc) is
> very slow.
> If I use hadoop fs -put xxx xxx , the performance is ok just use server
> seconds.
>
> My event is big over than 1k.
> I use flume-1.2.0 and my hadoop cluster is CDH4.
>
> Regards
> Shara
>
> -----邮件原件-----
> 发件人: Patrick Wendell [mailto:pwendell@gmail.com]
> 发送时间: 2012年8月28日 13:11
> 收件人: user@flume.apache.org
> 主题: Re: 答复: 答复: HDFS SINK Performacne
>
> Hey,
>
> Can you let us know what rate data is arriving at collector2 at? How many
> events/second and bytes/second, roughly?
>
> Also, why is your batch size so large? I'm not sure, but I think it may
> wait
> until it has received batchSize events before it decides to flush them to
> HDFS...  so this may create strange results depending on how many
> events/second you have.
>
> - Patrick
>
> On Mon, Aug 27, 2012 at 9:48 PM, Mohit Anchlia <mohitanchlia@gmail.com>
> wrote:
> > Do you get better performance when you directly write to the cluster?
> > Can you perform some tests writing to cluster directly and compare?
> >
> >
> > On Mon, Aug 27, 2012 at 8:19 PM, Shara Shi <shiruihong@dhgate.com>
> wrote:
> >>
> >> Hi Denny
> >>
> >>
> >>
> >> It is 20MB /min , I confirmed
> >>
> >> I sent data from avro-client from local to flume agent , I really got
> >> 20MB/min
> >>
> >> So I try to find out the reason why.
> >>
> >>
> >>
> >> Regards
> >>
> >> Shara
> >>
> >> 发件人: Denny Ye [mailto:dennyy99@gmail.com]
> >> 发送时间: 2012年8月28日 11:02
> >> 收件人: user@flume.apache.org
> >> 主题: Re: 答复: HDFS SINK Performacne
> >>
> >>
> >>
> >> 20MB/min or 20MB/sec?
> >>
> >> I doubt that it may have presentation mistake. Can you confirm it?
> >>
> >>
> >>
> >> -Regards
> >>
> >> Denny Ye
> >>
> >> 2012/8/28 Shara Shi <shiruihong@dhgate.com>
> >>
> >> Hi Denny
> >>
> >>
> >>
> >> The throughput is 45MB/sec is OK for me .
> >>
> >> But I just got 20M / Minutes
> >>
> >> What’s wrong with my configuration?
> >>
> >>
> >>
> >> Regards
> >>
> >> Shara
> >>
> >>
> >>
> >>
> >>
> >> 发件人: Denny Ye [mailto:dennyy99@gmail.com]
> >> 发送时间: 2012年8月27日 20:05
> >> 收件人: user@flume.apache.org
> >> 主题: Re: HDFS SINK Performacne
> >>
> >>
> >>
> >> hi Shara,
> >>
> >>     You are using MemoryChannel as repository. I tested it with
> outcomes:
> >> 45MB/sec without full GC in local updated code. Is this your goal? or
> >> more high throughput?
> >>
> >>
> >>
> >> -Regards
> >>
> >> Denny Ye
> >>
> >> 2012/8/27 Shara Shi <shiruihong@dhgate.com>
> >>
> >> Hi All,
> >>
> >>
> >>
> >> Whatever I have tuned parameters of hdfs sink, It can’t get higher
> >> performance over than 20MB per minutes.
> >>
> >> Is that normal? I think it is weird.
> >>
> >> How can I improve it
> >>
> >>
> >>
> >> Regards
> >>
> >> Ruihong Shi
> >>
> >> ==========================================
> >>
> >>
> >>
> >> # or more contributor license agreements.  See the NOTICE file
> >>
> >> # distributed with this work for additional information
> >>
> >> # regarding copyright ownership.  The ASF licenses this file
> >>
> >> # to you under the Apache License, Version 2.0 (the
> >>
> >> # "License"); you may not use this file except in compliance
> >>
> >> # with the License.  You may obtain a copy of the License at
> >>
> >> #
> >>
> >> #  http://www.apache.org/licenses/LICENSE-2.0
> >>
> >> #
> >>
> >> # Unless required by applicable law or agreed to in writing,
> >>
> >> # software distributed under the License is distributed on an
> >>
> >> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> >>
> >> # KIND, either express or implied.  See the License for the
> >>
> >> # specific language governing permissions and limitations
> >>
> >> # under the License.
> >>
> >>
> >>
> >> # Define a memory channel called ch1 on collector1
> >>
> >> collector2.channels.ch2.type = memory
> >>
> >> collector2.channels.ch2.capacity=500000
> >>
> >> collector2.channels.ch2.keep-alive=1
> >>
> >>
> >>
> >>
> >>
> >> # Define an Avro source called avro-source1 on agent1 and tell it
> >>
> >> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
> >>
> >> collector2.sources.avro-source1.channels = ch2
> >>
> >> collector2.sources.avro-source1.type = avro
> >>
> >> collector2.sources.avro-source1.bind = 0.0.0.0
> >>
> >> collector2.sources.avro-source1.port = 41415
> >>
> >> collector2.sources.avro-soruce1.threads = 10
> >>
> >>
> >>
> >>
> >>
> >> # Define a hdfs sink
> >>
> >> collector2.sinks.hdfs.channel = ch2
> >>
> >> collector2.sinks.hdfs.type= hdfs
> >>
> >>
> >> collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/
> >> webdata/exec/%Y/%m/%d/%H
> >>
> >> collector2.sinks.hdfs.batchsize=50000
> >>
> >> collector2.sinks.hdfs.runner.type=polling
> >>
> >> collector2.sinks.hdfs.runner.polling.interval = 1
> >>
> >> collector2.sinks.hdfs.hdfs.rollInterval = 120
> >>
> >> collector2.sinks.hdfs.hdfs.rollSize =0
> >>
> >> collector2.sinks.hdfs.hdfs.rollCount = 300000
> >>
> >> collector2.sinks.hdfs.hdfs.fileType=DataStream
> >>
> >> collector2.sinks.hdfs.hdfs.round =true
> >>
> >> collector2.sinks.hdfs.hdfs.roundValue = 10
> >>
> >> collector2.sinks.hdfs.hdfs.roundUnit = minute
> >>
> >> collector2.sinks.hdfs.hdfs.threadsPoolSize = 10
> >>
> >> collector2.sinks.hdfs.hdfs.rollTimerPoolSize = 10
> >>
> >>
> >>
> >> # Finally, now that we've defined all of our components, tell
> >>
> >> # agent1 which ones we want to activate.
> >>
> >>
> >>
> >>
> >
> >
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message