flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Adam <thomas.a...@tecbot.de>
Subject Flume logs http request info
Date Wed, 27 Feb 2013 11:25:47 GMT
Hi,

I have a issue with my flume agents which collectes JSON data and save
it to an hdfs store for hive. Today my daily job was broken because
malformed rows. I looked in this files to see what is happend and I
see I have something like this in my file:

...
POST / HTTP/1.0
Host: localhost:50000
Content-Length: 185
Content-Type: application/x-www-form-urlencoded
...

And this brokens my JSON serde in Hive. IMHO the flume agents logs
data themselves and I'm sure that I don't send any things like this.

I have two flume agents.
The first one collects data from my application with the HTTPSource:

http.sources = user_events
http.channels = user_events
http.sinks = user_events

http.sources.user_events.type = org.apache.flume.source.http.HTTPSource
http.sources.user_events.port = 50000
http.sources.user_events.interceptors = timestamp
http.sources.user_events.interceptors.timestamp.type = timestamp
http.sources.user_events.channels = user_events

http.channels.user_events.type = memory
http.channels.user_events.capacity = 100000
http.channels.user_events.transactionCapacity = 1000

http.sinks.user_events.type = avro
http.sinks.user_events.channel = user_events
http.sinks.user_events.hostname = 10.2.0.190
http.sinks.user_events.port = 20000
http.sinks.user_events.batch-size = 100

And the second agents puts the data into hdfs:

hdfs.sources = user_events
hdfs.channels = user_events
hdfs.sinks = user_events

hdfs.sources.user_events.type = avro
hdfs.sources.user_events.channels = user_events
hdfs.sources.user_events.bind = 10.2.0.190
hdfs.sources.user_events.port = 20000

hdfs.channels.user_events.type = memory
hdfs.channels.user_events.capacity = 100000
hdfs.channels.user_events.transactionCapacity = 1000

hdfs.sinks.user_events.type = hdfs
hdfs.sinks.user_events.channel = user_events
hdfs.sinks.user_events.hdfs.path =
hdfs://10.2.0.190:8020/user/beeswax/warehouse/user_events/dt=%Y-%m-%d/hour=%H
hdfs.sinks.user_events.hdfs.filePrefix = flume
hdfs.sinks.user_events.hdfs.rollInterval = 600
hdfs.sinks.user_events.hdfs.rollSize = 134217728
hdfs.sinks.user_events.hdfs.rollCount = 0
hdfs.sinks.user_events.hdfs.batchSize = 1000
hdfs.sinks.user_events.hdfs.fileType = DataStream

It' works since 3 months without any problems and I don't change
anything in this time.
I use flume 1.3.0 and cdh 4.1.2

I hope some one can help me too resolve this issue.

Thanks & Regards
Thomas

Mime
View raw message