ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Novogrodsky <david.novogrod...@gmail.com>
Subject Setting up and running Flume agents using Ambari
Date Wed, 24 Dec 2014 23:20:48 GMT
All,

I have run Flume agents on a pusedo-distributed VM from Cloudera
ingesting tweets from twitter.  When I paste the same configuratons
into the Flume section of Ambari I do not get any data from twitter.
The screen in Ambari says the agents are running but when I go to the
directory, I see no files:

[root@namenode PBX]# hadoop fs -ls  /user/flume/tweets
[root@namenode PBX]# hadoop fs -ls  /user/flume/tweets
[root@namenode PBX]# hadoop fs -ls  /user/flume/tweets/
[root@namenode PBX]#


I have attached the cluster parameters in a PDF.

Here is the URL I am using  to add the configuration to the Flume agents:
     http://namenode.localdomain.com:8080/#/main/services/FLUME/configs

Here is the configuration for the twitter agent:
# defining the source for the agent for Twitter
TwitterAgent.sources.Twitter.type =
org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemoryChannel
TwitterAgent.sources.Twitter.consumerKey = (just removing for security)
TwitterAgent.sources.Twitter.accessToken = (removing)
TwitterAgent.sources.Twitter.accessTokenSecret =(removing)
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics,
bigdata, cloudera, data science, data scientist, business
intelligence, mapreduce, data warehouse, data warehousing, mahout,
hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sources.Twitter.maxBatchSize = 10
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200

# defining the interceptors
TwitterAgent.sources.Twitter.interceptors = i1
TwitterAgent.sources.Twitter.interceptors.i1.type = timestamp


# defining the sink for the agent
TwitterAgent.sinks.HDFS.channel = MemoryChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/flume/tweets/%Y/%m/%d
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 6000
TwitterAgent.sinks.HDFS.hdfs.filePrefix = events-

# definning the channel for the agent
TwitterAgent.channels.MemoryChannel.type = memory
TwitterAgent.channels.MemoryChannel.capacity = 10000
TwitterAgent.channels.MemoryChannel.transactionCapacity = 10000


David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky

Mime
View raw message