flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Schroer <cschr...@autoscout24.com>
Subject AW: Problems with time variables in HDFS path
Date Wed, 11 Jul 2012 15:31:19 GMT
Hey Alex,

i used the logger command to generate a syslog message and rsyslogd to send it. I did this
to prevent any malformed message. rsyslogd talks RFC 3164 by default. If I use rsyslogd to
receive and store the message all information are fine, so the message itself should be correct.

Running your command breaks the cdh4.0.1 flume-ng version. And as far as I can see from your
pasted output, it is broken in your version too. The host is filled with "a", but should be
"host".

Also I tried to write into a logger sink, this doesn't break flume, but explains the problem
a bit more. Just writing to an HDFS sink breaks it (if you use %Y and so on inside the path).

echo "<13>Jun 20 12:12:12 host foo[345]: a syslog message with" > /tmp/foo; nc -v
aHostname 5140 < /tmp/foo
2012-07-11 16:42:58,779 INFO sink.LoggerSink: Event: { headers:{timestamp=1340187132000, Severity=5,
host=host, Facility=8} body: 66 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl
}

As you see, everything is fine. Timestamp is set, host is filled correctly and the HDFS sink
would be able to process this message.

echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with" > /tmp/foo; nc
-v aHostname 5140 < /tmp/foo
2012-07-11 16:42:34,006 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a, Facility=8}
body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog message w }

This one is broken, host is "a" and no timestamp :)

Best regards,
Chris

-----Urspr√ľngliche Nachricht-----
Von: alo alt [mailto:wget.null@gmail.com] 
Gesendet: Mittwoch, 11. Juli 2012 14:54
An: user@flume.apache.org
Betreff: Re: Problems with time variables in HDFS path

Chris,

syslog is a RFC defined protocol, we support only RFC-5424 and RFC-3164 formats. Since you've
to use valid syslog events it works:

echo "<13>Jun 20 12:12:12 host foo[345]: - a syslog message with -" > /tmp/foo nc
-v YOUR_IP 5140 < /tmp/foo

12/07/11 14:51:52 INFO sink.LoggerSink: Event: { headers:{Severity=5, host=a, Facility=8}
body: 73 79 73 6C 6F 67 20 6D 65 73 73 61 67 65 20 77 syslog message w }


- Alex

p.s. ich hab Deine Mail an mich nicht gesehen, aber es ist besser an die liste zu schreiben
;)



On Jul 11, 2012, at 12:15 PM, Juhani Connolly wrote:

> The time variables depend on the existence of a header with the key "timestamp". If it
isn't there, it tries to parse a non-existent header to calculate the time, and this happens.
I don't believe it has anything to do with the contents of your log message.
> 
> For the easiest way to add the header, I would recommend trying 1.2.0 
> as soon as it is released(or you can try grabbing the current release 
> candidate or even the 1.3.0 trunk which I'm running right now without 
> any serious issues), and using the TimestampInterceptor there. As this 
> is a frequent query I've made a jira to document this dependency 
> properly https://issues.apache.org/jira/browse/FLUME-1364
> 
> On 07/11/2012 06:41 PM, Christian Schroer wrote:
>> Hi,
>> 
>> we are running into a strange problem using Flume-NG 1.10 from CDH 4.0.1.
>> 
>> Setup:
>> Flume-NG opens a TCP syslog port, collects all messages and forwards them directly
into HDFS. This works fine until the point where we want to forward MS IIS Logs in W3C format.
The reason seems to be a " - " inside the log message. I could reproduce the problem using
rsyslogd forwarding all syslog messages to flume:
>> 
>> logger "Hello this is a test" => Works fine :)
>> 
>> logger "hello - this will break" => breaks flume :(
>> 
>> If I remove the time variables from the HDFS path in our configuration (attached)
everything is working fine...
>> 
>> Exception:
>> 
>> 2012-07-11 11:08:18,292 ERROR hdfs.HDFSEventSink: process failed
>> java.lang.NumberFormatException: null
>>         at java.lang.Long.parseLong(Long.java:375)
>>         at java.lang.Long.valueOf(Long.java:525)
>>         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2012-07-11 11:08:18,294 ERROR flume.SinkRunner: Unable to deliver event. Exception
follows.
>> org.apache.flume.EventDeliveryException: java.lang.NumberFormatException: null
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:469)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.NumberFormatException: null
>>         at java.lang.Long.parseLong(Long.java:375)
>>         at java.lang.Long.valueOf(Long.java:525)
>>         at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:220)
>>         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:310)
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:402)
>>         ... 3 more
>> 
>> I attached our configuration in case something is broken in there.
>> 
>> Best regards,
>> 
>> Christian Schroer
>> 
> 
> 


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF


Mime
View raw message