flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shara Shi" <shiruih...@dhgate.com>
Subject HDFS SINK Performacne
Date Mon, 27 Aug 2012 09:26:30 GMT
Hi All, 

 

Whatever I have tuned parameters of hdfs sink, It can't get higher
performance over than 20MB per minutes.

Is that normal? I think it is weird.

How can I improve it

 

Regards

Ruihong Shi

==========================================

 

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#  http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing,

# software distributed under the License is distributed on an

# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

# KIND, either express or implied.  See the License for the

# specific language governing permissions and limitations

# under the License.

 

# Define a memory channel called ch1 on collector1

collector2.channels.ch2.type = memory

collector2.channels.ch2.capacity=500000

collector2.channels.ch2.keep-alive=1

 

 

# Define an Avro source called avro-source1 on agent1 and tell it

# to bind to 0.0.0.0:41414. Connect it to channel ch1.

collector2.sources.avro-source1.channels = ch2

collector2.sources.avro-source1.type = avro

collector2.sources.avro-source1.bind = 0.0.0.0

collector2.sources.avro-source1.port = 41415

collector2.sources.avro-soruce1.threads = 10

 

 

# Define a hdfs sink

collector2.sinks.hdfs.channel = ch2

collector2.sinks.hdfs.type= hdfs

collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata
/exec/%Y/%m/%d/%H

collector2.sinks.hdfs.batchsize=50000

collector2.sinks.hdfs.runner.type=polling

collector2.sinks.hdfs.runner.polling.interval = 1

collector2.sinks.hdfs.hdfs.rollInterval = 120

collector2.sinks.hdfs.hdfs.rollSize =0

collector2.sinks.hdfs.hdfs.rollCount = 300000

collector2.sinks.hdfs.hdfs.fileType=DataStream

collector2.sinks.hdfs.hdfs.round =true

collector2.sinks.hdfs.hdfs.roundValue = 10

collector2.sinks.hdfs.hdfs.roundUnit = minute

collector2.sinks.hdfs.hdfs.threadsPoolSize = 10

collector2.sinks.hdfs.hdfs.rollTimerPoolSize = 10

 

# Finally, now that we've defined all of our components, tell

# agent1 which ones we want to activate.


Mime
View raw message