flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Suarez <ryan.sua...@sheridancollege.ca>
Subject Re: preserve syslog header in hdfs sink
Date Wed, 02 Apr 2014 18:32:06 GMT
oops, my bad.  Typo in my config file. I incorrectly put 
fileType=datastream instead of hdfs.fileType=datastream.  Thanks Jeff!  
It's working for me now. I see timestamp and hostname.

regards,
Ryan

On 14-04-02 2:21 PM, Ryan Suarez wrote:
> Ok, I've added hdfs.fileType = datastream and sink.serializer = 
> header_and_text.  But I'm still seeing the logs written in sequence 
> format.  Any ideas?
>
> -----
> flume@hadoop-t1:~$ flume-ng version
> Flume 1.4.0.2.0.11.0-1
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: fcdc3d29a1f249bef653b10b149aea2bc5df892e
> Compiled by jenkins on Wed Mar 12 05:11:30 PDT 2014
> From source with checksum dea9ae30ce2c27486ae7c76ab7aba020
>
>
> -----
> root@hadoop-t1:/etc/flume/conf# cat flume-conf.properties
> # Name the components on this agent
> hadoop-t1.sources = r1
> hadoop-t1.sinks = s1
> hadoop-t1.channels = mem1
>
> # Describe/configure the source
> hadoop-t1.sources.r1.type = syslogtcp
> hadoop-t1.sources.r1.host = localhost
> hadoop-t1.sources.r1.port = 10005
> hadoop-t1.sources.r1.portHeader = port
> hadoop-t1.sources.r1.interceptors = i1 i2
> hadoop-t1.sources.r1.interceptors.i1.type = timestamp
> hadoop-t1.sources.r1.interceptors.i2.type = host
> hadoop-t1.sources.r1.interceptors.i2.hostHeader = hostname
>
> ##HDFS Sink
> hadoop-t1.sinks.s1.type = hdfs
> hadoop-t1.sinks.s1.fileType = *DataStream*
> hadoop-t1.sinks.s1.hdfs.path = 
> hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
> hadoop-t1.sinks.s1.hdfs.batchSize = 1
> hadoop-t1.sinks.s1.serializer = *header_and_text*
> hadoop-t1.sinks.s1.serializer.columns = timestamp hostname
> hadoop-t1.sinks.s1.serializer.format = CSV
> hadoop-t1.sinks.s1.serializer.appendNewline = true
>
> ## MEM  Use a channel which buffers events in memory
> hadoop-t1.channels.mem1.type = memory
> hadoop-t1.channels.mem1.capacity = 1000
> hadoop-t1.channels.mem1.transactionCapacity = 100
>
> # Bind the source and sink to the channel
> hadoop-t1.sources.r1.channels = mem1
> hadoop-t1.sinks.s1.channel = mem1
>
> On 14-04-01 12:13 PM, Jeff Lord wrote:
>> Well you are writing a sequence file (default) Is that what you want?
>> If you want text use:
>>
>> hdfs.fileType = datastream
>>
>> and for the serializer you should be able to just use:
>>
>> a1.sinks.k1.sink.serializer = header_and_text
>>
>>
>>
>> On Tue, Apr 1, 2014 at 8:02 AM, Ryan Suarez 
>> <ryan.suarez@sheridancollege.ca 
>> <mailto:ryan.suarez@sheridancollege.ca>> wrote:
>>
>>     Thanks for the tip!  I was indeed missing the interceptors.  I've
>>     added them now but the timestamp and hostname is still not
>>     showing up in the hdfs log. Any advice?
>>
>>
>>     ------- sample event in HDFS ------
>>     SEQ
>>     !org.apache.hadoop.io.LongWritable”org.apache.hadoop.io.BytesWritable������cc�c��I�[��ڳ\�����`���
>>     �� E � ����Tsu[28432]: pam_unix(su:session): session opened for
>>     user root by myuser(uid=31043)
>>
>>     ------ same event in syslog ------
>>     Mar 31 16:18:32 hadoop-t1 su[28432]: pam_unix(su:session):
>>     session opened for user root by myuser(uid=31043)
>>
>>     ------- flume-conf.properties --------
>>
>>     # Name the components on this agent
>>     hadoop-t1.sources = r1
>>     hadoop-t1.sinks = s1
>>
>>     hadoop-t1.channels = mem1
>>
>>     # Describe/configure the source
>>     hadoop-t1.sources.r1.type = syslogtcp
>>     hadoop-t1.sources.r1.host = localhost
>>     hadoop-t1.sources.r1.port = 10005
>>     hadoop-t1.sources.r1.portHeader = port
>>     hadoop-t1.sources.r1.interceptors = i1 i2
>>     hadoop-t1.sources.r1.interceptors.i1.type = timestamp
>>     hadoop-t1.sources.r1.interceptors.i2.type = host
>>     hadoop-t1.sources.r1.interceptors.i2.hostHeader = hostname
>>
>>     ##HDFS Sink
>>     hadoop-t1.sinks.s1.type = hdfs
>>     hadoop-t1.sinks.s1.hdfs.path =
>>     hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
>>     <http://hadoop-t1.mydomain.org:8020/opt/logs/%%7Bhost%7D/%Y-%m-%d>
>>     hadoop-t1.sinks.s1.hdfs.batchSize = 1
>>     hadoop-t1.sinks.s1.serializer =
>>     org.apache.flume.serialization.HeaderAndBodyTextEventSerializer$Builder
>>     hadoop-t1.sinks.s1.serializer.columns = timestamp hostname
>>     hadoop-t1.sinks.s1.serializer.format = CSV
>>     hadoop-t1.sinks.s1.serializer.appendNewline = true
>>
>>     ## MEM  Use a channel which buffers events in memory
>>
>>     hadoop-t1.channels.mem1.type = memory
>>     hadoop-t1.channels.mem1.capacity = 1000
>>     hadoop-t1.channels.mem1.transactionCapacity = 100
>>
>>     # Bind the source and sink to the channel
>>     hadoop-t1.sources.r1.channels = mem1
>>     hadoop-t1.sinks.s1.channel = mem1
>>
>>
>>
>>     On 14-03-28 3:37 PM, Jeff Lord wrote:
>>>     Do you have the appropriate interceptors configured?
>>>
>>>
>>>     On Fri, Mar 28, 2014 at 12:28 PM, Ryan Suarez
>>>     <ryan.suarez@sheridancollege.ca
>>>     <mailto:ryan.suarez@sheridancollege.ca>> wrote:
>>>
>>>         RTFM indicates I need the following sink properties:
>>>
>>>         ---
>>>         hadoop-t1.sinks.hdfs1.serializer =
>>>         org.apache.flume.serialization.HeaderAndBodyTextEventSerializer
>>>         hadoop-t1.sinks.hdfs1.serializer.columns = timestamp
>>>         hostname msg
>>>         hadoop-t1.sinks.hdfs1.serializer.format = CSV
>>>         hadoop-t1.sinks.hdfs1.serializer.appendNewline = true
>>>         ---
>>>
>>>         But I'm still not getting timestamp information.  How would
>>>         I get hostname and timestamp information in the logs?
>>>
>>>
>>>         On 14-03-26 3:02 PM, Ryan Suarez wrote:
>>>
>>>             Greetings,
>>>
>>>             I'm running flume that's shipped with Hortonworks HDP2
>>>             to feed syslogs to hdfs.  The problem is the timestamp
>>>             and hostname of the event is not logged to hdfs.
>>>
>>>             ---
>>>             flume@hadoop-t1:~$ hadoop fs -cat
>>>             /opt/logs/hadoop-t1/2014-03-26/FlumeData.1395859766307
>>>             SEQ!org.apache.hadoop.io
>>>             <http://org.apache.hadoop.io>.LongWritable"org.apache.hadoop.io.BytesWritable??Ak?i<??G??`D??$hTsu[22209]:
>>>             pam_unix(su:session): session opened for user root by
>>>             someuser(uid=11111)
>>>             ---
>>>
>>>             How do I configure the sink to add hostname and
>>>             timestamp info the the event?
>>>
>>>             Here's my flume-conf.properties:
>>>
>>>             ---
>>>             flume@hadoop-t1:/etc/flume/conf$ cat flume-conf.properties
>>>             # Name the components on this agent
>>>             hadoop-t1.sources = syslog1
>>>             hadoop-t1.sinks = hdfs1
>>>             hadoop-t1.channels = mem1
>>>
>>>             # Describe/configure the source
>>>             hadoop-t1.sources.syslog1.type = syslogtcp
>>>             hadoop-t1.sources.syslog1.host = localhost
>>>             hadoop-t1.sources.syslog1.port = 10005
>>>             hadoop-t1.sources.syslog1.portHeader = port
>>>
>>>             ##HDFS Sink
>>>             hadoop-t1.sinks.hdfs1.type = hdfs
>>>             hadoop-t1.sinks.hdfs1.hdfs.path =
>>>             hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
>>>             <http://hadoop-t1.mydomain.org:8020/opt/logs/%%7Bhost%7D/%Y-%m-%d>
>>>             hadoop-t1.sinks.hdfs1.hdfs.batchSize = 1
>>>
>>>             # Use a channel which buffers events in memory
>>>             hadoop-t1.channels.mem1.type = memory
>>>             hadoop-t1.channels.mem1.capacity = 1000
>>>             hadoop-t1.channels.mem1.transactionCapacity = 100
>>>
>>>             # Bind the source and sink to the channel
>>>             hadoop-t1.sources.syslog1.channels = mem1
>>>             hadoop-t1.sinks.hdfs1.channel = mem1
>>>             ---
>>>
>>>             ---
>>>             flume@hadoop-t1:~$ flume-ng version
>>>             Flume 1.4.0.2.0.11.0-1
>>>             Source code repository:
>>>             https://git-wip-us.apache.org/repos/asf/flume.git
>>>             Revision: fcdc3d29a1f249bef653b10b149aea2bc5df892e
>>>             Compiled by jenkins on Wed Mar 12 05:11:30 PDT 2014
>>>             From source with checksum dea9ae30ce2c27486ae7c76ab7aba020
>>>             ---
>>>
>>>
>>>
>>
>>
>


Mime
View raw message