flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Surage <csur...@gmail.com>
Subject Problem setting the rollInterval for HDFS sink
Date Thu, 24 Oct 2013 13:44:27 GMT
Hello I am having an issue increasing the size of the file which get
written into my hdfs. I have tried playing with the rollCount attribute for
an hdfs sink but it seems to cap at 10 lines of text per file, with many
files written to the hdfs directory. Now one may see why I need to change
this.

I have 2 boxes running
1) uses a spooldir source to check for new log files copied to a specific
dir. It then sends the events to an avro sink through a mem channel to the
other box with the hdfs on it.




2) uses an avro source and sends events to the hdfs sink.


configurations:

1.
 # Name the compnents of the agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1


###############Describe/configure the source#################
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /u1/csurage/flume_test
a1.sources.r1.channels = c1
#a1.sources.r1.fileHeader = true


##############describe the sink#######################
# file roll sink
#a1.sinks.k1.type = file_roll
#a1.sinks.k1.sink.directory = /u1/csurage/target_flume

# Avro sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 45.32.96.136
a1.sinks.k1.port = 9311


# Channel the sink connects to
a1.sinks.k1.channel = c1

################describe the channel##################
# use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.byteCapacity = 0



2. note when I change any of the attributes in bold, the rollCount stays at
10 line
    files written to the hdfs

# Name the compnents of the agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1


###############Describe/configure the source#################
a1.sources.r1.type = avro
a1.sources.r1.bind = 45.32.96.136
a1.sources.r1.port = 9311
a1.sources.r1.channels = c1
#a1.sources.r1.fileHeader = true


##############describe the sink#######################
# HDFS sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /user/csurage/hive
a1.sinks.k1.hdfs.fileType = DataStream
*a1.sinks.k1.hdfs.rollsize = 0*
*a1.sinks.k1.hdfs.rollCount = 20   *
*a1.sinks.k1.hdfs.rollInterval = 0*


# Channel the sink connects to
a1.sinks.k1.channel = c1


################describe the channel##################
# use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.byteCapacity = 0


Please any help would be greatly appreciated, I have been stuck on this for
2 days.

regards,

Chris

Mime
View raw message