flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guyle M. Taber" <gu...@gmtech.net>
Subject Flume truncating files at about 2060 characters
Date Mon, 31 Aug 2015 18:03:13 GMT
I’m using an Avrosink to send events to HDFS and we’re seeing with long content lines,
our lines seem to be getting truncated at about the 2060 character mark. How can I prevent
long lines from being truncated when using an Avro sink in this fashion?

Here’s a snippet of an event from the raw logs before flume is involved. I’ve toggled
hidden characters so you can see the EOL character being inserted, which breaks up the event
into two lines.

…utm_campaign=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4&camp=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4^Isearch-term[=]^Isession-id[=]720D69AB19F1DD17D27A948C9B31D380^Istore-id[=]^Itracking-ticket-id[=]^Itracking-ticket-number[=]^Ievent-session-id[=]98df4905-51ab-43a9-92d9-35d879a69b9a
$

Here’s a snippet of an event that gets truncated.

…utm_campaign=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4&camp=%E5%81%A5%E5%BA%$

B7%E7%BE%8E%E6%8A%A4^Isearch-term[=]^Isession-id[=]720D69AB19F1DD17D27A948C9B31D380^Istore-id[=]^Itracking-ticket-id[=]^Itracking-ticket-number[=]^Ievent-session-id[=]98df4905-51ab-43a9-92d9-35d879a69b9a
$

Here is our sink on the sending node.

agent.sinks = AvroSink
agent.sinks.AvroSink.type = avro
agent.sinks.AvroSink.channel = memoryChannel
agent.sinks.AvroSink.hostname = flume.mydomain.int
agent.sinks.AvroSink.port = 4169
agent.sinks.AvroSink.batchSize = 0
agent.sinks.AvroSink.rollSize = 0
agent.sinks.AvroSink.rollInterval = 0
agent.sinks.AvroSink.rollCount = 0
agent.sinks.AvroSink.idleTimeout = 0
agent.sinks.AvroSink.useLocalTimeStamp = true

Here is our sink on the HDFS receiving side.

dp1.sinks.sinkCN.type = hdfs
dp1.sinks.sinkCN.channel = channelCN
dp1.sinks.sinkCN.hdfs.filePrefix = %{basename}-
dp1.sinks.sinkCN.hdfs.path = hdfs://sf1-hadoopnn1.mydomain.int/flume/events/ods/cn/fe_event/%{host}/%y-%m-%d
dp1.sinks.sinkCN.hdfs.fileType = DataStream
dp1.sinks.sinkCN.hdfs.writeFormat = Text
dp1.sinks.sinkCN.hdfs.rollSize = 0
dp1.sinks.sinkCN.hdfs.rollCount = 0
dp1.sinks.sinkCN.hdfs.batchSize = 5000
Mime
View raw message