flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Suarez <ryan.sua...@sheridancollege.ca>
Subject Re: preserve syslog header in hdfs sink
Date Wed, 02 Apr 2014 18:21:05 GMT
Ok, I've added hdfs.fileType = datastream and sink.serializer = 
header_and_text.  But I'm still seeing the logs written in sequence 
format.  Any ideas?

-----
flume@hadoop-t1:~$ flume-ng version
Flume 1.4.0.2.0.11.0-1
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: fcdc3d29a1f249bef653b10b149aea2bc5df892e
Compiled by jenkins on Wed Mar 12 05:11:30 PDT 2014
 From source with checksum dea9ae30ce2c27486ae7c76ab7aba020


-----
root@hadoop-t1:/etc/flume/conf# cat flume-conf.properties
# Name the components on this agent
hadoop-t1.sources = r1
hadoop-t1.sinks = s1
hadoop-t1.channels = mem1

# Describe/configure the source
hadoop-t1.sources.r1.type = syslogtcp
hadoop-t1.sources.r1.host = localhost
hadoop-t1.sources.r1.port = 10005
hadoop-t1.sources.r1.portHeader = port
hadoop-t1.sources.r1.interceptors = i1 i2
hadoop-t1.sources.r1.interceptors.i1.type = timestamp
hadoop-t1.sources.r1.interceptors.i2.type = host
hadoop-t1.sources.r1.interceptors.i2.hostHeader = hostname

##HDFS Sink
hadoop-t1.sinks.s1.type = hdfs
hadoop-t1.sinks.s1.fileType = *DataStream*
hadoop-t1.sinks.s1.hdfs.path = 
hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
hadoop-t1.sinks.s1.hdfs.batchSize = 1
hadoop-t1.sinks.s1.serializer = *header_and_text*
hadoop-t1.sinks.s1.serializer.columns = timestamp hostname
hadoop-t1.sinks.s1.serializer.format = CSV
hadoop-t1.sinks.s1.serializer.appendNewline = true

## MEM  Use a channel which buffers events in memory
hadoop-t1.channels.mem1.type = memory
hadoop-t1.channels.mem1.capacity = 1000
hadoop-t1.channels.mem1.transactionCapacity = 100

# Bind the source and sink to the channel
hadoop-t1.sources.r1.channels = mem1
hadoop-t1.sinks.s1.channel = mem1

On 14-04-01 12:13 PM, Jeff Lord wrote:
> Well you are writing a sequence file (default) Is that what you want?
> If you want text use:
>
> hdfs.fileType = datastream
>
> and for the serializer you should be able to just use:
>
> a1.sinks.k1.sink.serializer = header_and_text
>
>
>
> On Tue, Apr 1, 2014 at 8:02 AM, Ryan Suarez 
> <ryan.suarez@sheridancollege.ca 
> <mailto:ryan.suarez@sheridancollege.ca>> wrote:
>
>     Thanks for the tip!  I was indeed missing the interceptors.  I've
>     added them now but the timestamp and hostname is still not showing
>     up in the hdfs log.  Any advice?
>
>
>     ------- sample event in HDFS ------
>     SEQ
>     !org.apache.hadoop.io.LongWritable”org.apache.hadoop.io.BytesWritable������cc�c��I�[��ڳ\�����`���
>     �� E � ����Tsu[28432]: pam_unix(su:session): session opened for
>     user root by myuser(uid=31043)
>
>     ------ same event in syslog ------
>     Mar 31 16:18:32 hadoop-t1 su[28432]: pam_unix(su:session): session
>     opened for user root by myuser(uid=31043)
>
>     ------- flume-conf.properties --------
>
>     # Name the components on this agent
>     hadoop-t1.sources = r1
>     hadoop-t1.sinks = s1
>
>     hadoop-t1.channels = mem1
>
>     # Describe/configure the source
>     hadoop-t1.sources.r1.type = syslogtcp
>     hadoop-t1.sources.r1.host = localhost
>     hadoop-t1.sources.r1.port = 10005
>     hadoop-t1.sources.r1.portHeader = port
>     hadoop-t1.sources.r1.interceptors = i1 i2
>     hadoop-t1.sources.r1.interceptors.i1.type = timestamp
>     hadoop-t1.sources.r1.interceptors.i2.type = host
>     hadoop-t1.sources.r1.interceptors.i2.hostHeader = hostname
>
>     ##HDFS Sink
>     hadoop-t1.sinks.s1.type = hdfs
>     hadoop-t1.sinks.s1.hdfs.path =
>     hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
>     <http://hadoop-t1.mydomain.org:8020/opt/logs/%%7Bhost%7D/%Y-%m-%d>
>     hadoop-t1.sinks.s1.hdfs.batchSize = 1
>     hadoop-t1.sinks.s1.serializer =
>     org.apache.flume.serialization.HeaderAndBodyTextEventSerializer$Builder
>     hadoop-t1.sinks.s1.serializer.columns = timestamp hostname
>     hadoop-t1.sinks.s1.serializer.format = CSV
>     hadoop-t1.sinks.s1.serializer.appendNewline = true
>
>     ## MEM  Use a channel which buffers events in memory
>
>     hadoop-t1.channels.mem1.type = memory
>     hadoop-t1.channels.mem1.capacity = 1000
>     hadoop-t1.channels.mem1.transactionCapacity = 100
>
>     # Bind the source and sink to the channel
>     hadoop-t1.sources.r1.channels = mem1
>     hadoop-t1.sinks.s1.channel = mem1
>
>
>
>     On 14-03-28 3:37 PM, Jeff Lord wrote:
>>     Do you have the appropriate interceptors configured?
>>
>>
>>     On Fri, Mar 28, 2014 at 12:28 PM, Ryan Suarez
>>     <ryan.suarez@sheridancollege.ca
>>     <mailto:ryan.suarez@sheridancollege.ca>> wrote:
>>
>>         RTFM indicates I need the following sink properties:
>>
>>         ---
>>         hadoop-t1.sinks.hdfs1.serializer =
>>         org.apache.flume.serialization.HeaderAndBodyTextEventSerializer
>>         hadoop-t1.sinks.hdfs1.serializer.columns = timestamp hostname msg
>>         hadoop-t1.sinks.hdfs1.serializer.format = CSV
>>         hadoop-t1.sinks.hdfs1.serializer.appendNewline = true
>>         ---
>>
>>         But I'm still not getting timestamp information.  How would I
>>         get hostname and timestamp information in the logs?
>>
>>
>>         On 14-03-26 3:02 PM, Ryan Suarez wrote:
>>
>>             Greetings,
>>
>>             I'm running flume that's shipped with Hortonworks HDP2 to
>>             feed syslogs to hdfs.  The problem is the timestamp and
>>             hostname of the event is not logged to hdfs.
>>
>>             ---
>>             flume@hadoop-t1:~$ hadoop fs -cat
>>             /opt/logs/hadoop-t1/2014-03-26/FlumeData.1395859766307
>>             SEQ!org.apache.hadoop.io
>>             <http://org.apache.hadoop.io>.LongWritable"org.apache.hadoop.io.BytesWritable??Ak?i<??G??`D??$hTsu[22209]:
>>             pam_unix(su:session): session opened for user root by
>>             someuser(uid=11111)
>>             ---
>>
>>             How do I configure the sink to add hostname and timestamp
>>             info the the event?
>>
>>             Here's my flume-conf.properties:
>>
>>             ---
>>             flume@hadoop-t1:/etc/flume/conf$ cat flume-conf.properties
>>             # Name the components on this agent
>>             hadoop-t1.sources = syslog1
>>             hadoop-t1.sinks = hdfs1
>>             hadoop-t1.channels = mem1
>>
>>             # Describe/configure the source
>>             hadoop-t1.sources.syslog1.type = syslogtcp
>>             hadoop-t1.sources.syslog1.host = localhost
>>             hadoop-t1.sources.syslog1.port = 10005
>>             hadoop-t1.sources.syslog1.portHeader = port
>>
>>             ##HDFS Sink
>>             hadoop-t1.sinks.hdfs1.type = hdfs
>>             hadoop-t1.sinks.hdfs1.hdfs.path =
>>             hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
>>             <http://hadoop-t1.mydomain.org:8020/opt/logs/%%7Bhost%7D/%Y-%m-%d>
>>             hadoop-t1.sinks.hdfs1.hdfs.batchSize = 1
>>
>>             # Use a channel which buffers events in memory
>>             hadoop-t1.channels.mem1.type = memory
>>             hadoop-t1.channels.mem1.capacity = 1000
>>             hadoop-t1.channels.mem1.transactionCapacity = 100
>>
>>             # Bind the source and sink to the channel
>>             hadoop-t1.sources.syslog1.channels = mem1
>>             hadoop-t1.sinks.hdfs1.channel = mem1
>>             ---
>>
>>             ---
>>             flume@hadoop-t1:~$ flume-ng version
>>             Flume 1.4.0.2.0.11.0-1
>>             Source code repository:
>>             https://git-wip-us.apache.org/repos/asf/flume.git
>>             Revision: fcdc3d29a1f249bef653b10b149aea2bc5df892e
>>             Compiled by jenkins on Wed Mar 12 05:11:30 PDT 2014
>>             From source with checksum dea9ae30ce2c27486ae7c76ab7aba020
>>             ---
>>
>>
>>
>
>


Mime
View raw message