flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pravesh Bhardwaj <pravesh.bhard...@gmail.com>
Subject Control character stuffing when using Kafka Sink
Date Thu, 18 Feb 2016 03:44:24 GMT
Greetings,

I am trying to use Flume to read my source files (pipe delimited text
files) and feed them to kafka.

All of the plumbing seems to work fine and all the records are getting in
to kafka successfully. However, Flume seems to add NUL and STX control
characters at the start of each data line.

This is bit if an issue for me because eventually I am streaming all the
data from Kafka to Amazon S3 and finally into Amazon Redshift. Owing to
these control characters, my data loads to Redshift (COPY command) are
failing.

I have spent considerable time researching this problem on internet but no
luck so far. Can I somehow instruct Flume not to add these characters in my
source data?

My Flume configuration is given below. I have tried both "exec" and
"spoolDir" sources and it changed nothing.

kafka-agent.channels=ch1

kafka-agent.channels.ch1.type=org.apache.flume.channel.kafka.KafkaChannel
kafka-agent.channels.ch1.brokerList=localhost:9092
kafka-agent.channels.ch1.topic=call-center-dimension
kafka-agent.channels.ch1.zookeeperConnect=localhost:2181
kafka-agent.channels.ch1.capacity=10000
kafka-agent.channels.ch1.transactionCapacity=10000
kafka-agent.channels.ch1.parseAsFlumeEvent = true
kafka-agent.channels.ch1.kafka.serializer.class=kafka.serializer.DefaultEncoder

kafka-agent.sources=tail

#kafka-agent.sources.tail.type=spooldir
#kafka-agent.sources.tail.channels=ch1
#kafka-agent.sources.tail.spoolDir=/home/ec2-user/flumespool
#kafka-agent.sources.tail.fileHeader=false

kafka-agent.sources.tail.type=exec
kafka-agent.sources.tail.channels=ch1
kafka-agent.sources.tail.shell=/bin/bash -c
kafka-agent.sources.tail.command=cat /tmp/call_center_dimension_1.out

kafka-agent.sinks=sink1
kafka-agent.sinks.sink1.type=org.apache.flume.sink.kafka.KafkaSink
kafka-agent.sinks.sink1.brokerList=localhost:9092
kafka-agent.sinks.sink1.topic=kafka
kafka-agent.sinks.sink1.channel=ch1
kafka-agent.sinks.sink1.batchSize=5
kafka-agent.sinks.sink1.kafka.serializer.class=kafka.serializer.StringEncoder

Any help is appreciated.

Thanks

Mime
View raw message