Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 39DDAD5C7 for ; Mon, 27 Aug 2012 09:28:37 +0000 (UTC) Received: (qmail 32371 invoked by uid 500); 27 Aug 2012 09:28:36 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 32278 invoked by uid 500); 27 Aug 2012 09:28:35 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 32238 invoked by uid 99); 27 Aug 2012 09:28:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 09:28:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shiruihong@dhgate.com designates 211.150.67.202 as permitted sender) Received: from [211.150.67.202] (HELO smtp.263xmail.com) (211.150.67.202) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 09:28:26 +0000 Received: from smtp.263xmail.com (localhost.localdomain [127.0.0.1]) by smtp.263xmail.com (Postfix) with ESMTP id 73D3AA03 for ; Mon, 27 Aug 2012 17:28:00 +0800 (CST) X-263anti-spam: BLACK:0;BIG:0;KSV:0; X-MAIL-GRAY: 0 X-MAIL-DELIVERY: 1 X-ABS-CHECKED: 1 X-KSVirus-check: 0 Received: from shiruihong (localhost.localdomain [127.0.0.1]) by smtp.263xmail.com (Postfix) with ESMTP id 250EC46D for ; Mon, 27 Aug 2012 17:28:00 +0800 (CST) X-SENDER-IP: 124.205.130.34 X-LOGIN-NAME: shiruihong@dhgate.com X-SENDER: shiruihong@dhgate.com X-DNS-TYPE: 0 Received: from shiruihong (unknown [124.205.130.34]) by smtp.263xmail.com (Postfix) whith ESMTP id 26831MN1SLG; Mon, 27 Aug 2012 17:28:00 +0800 (CST) From: "Shara Shi" To: Subject: HDFS SINK Performacne Date: Mon, 27 Aug 2012 17:26:30 +0800 Message-ID: <001d01cd8436$0ad05be0$207113a0$@com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_001E_01CD8479.18F39BE0" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac2ENgqNIvsdSYcDSuidlUg8S0QubQ== Content-Language: zh-cn X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_001E_01CD8479.18F39BE0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi All, Whatever I have tuned parameters of hdfs sink, It can't get higher performance over than 20MB per minutes. Is that normal? I think it is weird. How can I improve it Regards Ruihong Shi ========================================== # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # Define a memory channel called ch1 on collector1 collector2.channels.ch2.type = memory collector2.channels.ch2.capacity=500000 collector2.channels.ch2.keep-alive=1 # Define an Avro source called avro-source1 on agent1 and tell it # to bind to 0.0.0.0:41414. Connect it to channel ch1. collector2.sources.avro-source1.channels = ch2 collector2.sources.avro-source1.type = avro collector2.sources.avro-source1.bind = 0.0.0.0 collector2.sources.avro-source1.port = 41415 collector2.sources.avro-soruce1.threads = 10 # Define a hdfs sink collector2.sinks.hdfs.channel = ch2 collector2.sinks.hdfs.type= hdfs collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata /exec/%Y/%m/%d/%H collector2.sinks.hdfs.batchsize=50000 collector2.sinks.hdfs.runner.type=polling collector2.sinks.hdfs.runner.polling.interval = 1 collector2.sinks.hdfs.hdfs.rollInterval = 120 collector2.sinks.hdfs.hdfs.rollSize =0 collector2.sinks.hdfs.hdfs.rollCount = 300000 collector2.sinks.hdfs.hdfs.fileType=DataStream collector2.sinks.hdfs.hdfs.round =true collector2.sinks.hdfs.hdfs.roundValue = 10 collector2.sinks.hdfs.hdfs.roundUnit = minute collector2.sinks.hdfs.hdfs.threadsPoolSize = 10 collector2.sinks.hdfs.hdfs.rollTimerPoolSize = 10 # Finally, now that we've defined all of our components, tell # agent1 which ones we want to activate. ------=_NextPart_000_001E_01CD8479.18F39BE0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi All, =

 

Whatever I have tuned parameters of hdfs sink, It = can’t get higher performance over than 20MB per = minutes.

Is = that normal? I think it is weird.

How can I improve = it

 

Regards

Ruihong Shi

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 

# or more contributor license agreements.  See the = NOTICE file

# distributed with this work for additional = information

# regarding copyright ownership.  The ASF licenses = this file

# = to you under the Apache License, Version 2.0 = (the

# = "License"); you may not use this file except in = compliance

# with the License.  You may obtain a copy of the = License at

#

#  = http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law = or agreed to in writing,

# software distributed under the License is distributed on = an

# = "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF = ANY

# KIND, = either express or implied.  See the License for = the

# = specific language governing permissions and = limitations

# under the License.

 

# Define a memory channel called = ch1 on collector1

collector2.channels.ch2.type =3D = memory

collector2.channels.ch2.capacity=3D500000<= /p>

collector2.channels.ch2.keep-alive=3D1

=

 

 

# Define an Avro source called = avro-source1 on agent1 and tell it

# to bind to 0.0.0.0:41414. Connect = it to channel ch1.

collector2.sources.avro-source1.channels =3D = ch2

collector2.sources.avro-source1.type =3D = avro

collector2.sources.avro-source1.bind =3D = 0.0.0.0

collector2.sources.avro-source1.port =3D = 41415

collector2.sources.avro-soruce1.threads =3D = 10

 

 

# Define a hdfs sink

collector2.sinks.hdfs.channel =3D = ch2

collector2.sinks.hdfs.type=3D hdfs

collector2.sinks.hdfs.hdfs.path=3Dhdfs://namenode:8020/user/= root/flume/webdata/exec/%Y/%m/%d/%H

collector2.sinks.hdfs.batchsize=3D50000

collector2.sinks.hdfs.runner.type=3Dpolling

collector2.sinks.hdfs.runner.polling.interval =3D = 1

collector2.sinks.hdfs.hdfs.rollInterval =3D = 120

collector2.sinks.hdfs.hdfs.rollSize = =3D0

collector2.sinks.hdfs.hdfs.rollCount =3D = 300000

collector2.sinks.hdfs.hdfs.fileType=3DDataStream<= /span>

collector2.sinks.hdfs.hdfs.round = =3Dtrue

collector2.sinks.hdfs.hdfs.roundValue =3D = 10

collector2.sinks.hdfs.hdfs.roundUnit =3D = minute

collector2.sinks.hdfs.hdfs.threadsPoolSize =3D = 10

collector2.sinks.hdfs.hdfs.rollTimerPoolSize =3D = 10

 

# Finally, now that we've defined all of our components, = tell

# = agent1 which ones we want to = activate.

------=_NextPart_000_001E_01CD8479.18F39BE0--