flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Vila <ahmed.v...@symphony.is>
Subject Re: flume sink and substitution variables
Date Fri, 29 Jul 2016 01:04:26 GMT
You can actually have a custom format for apache2 access log so it includes
virtual host:
https://httpd.apache.org/docs/2.4/logs.html#virtualhost

If you're using Apache2 as click-stream endpoint only, I would suggest to
use nginx instead as it's a perfect match for that in it's base form
without fully featured application web server.
>From the tests I've done when I was developing click-stream collection
professional solution, nginx turned out to be couple times faster for this
particular scenario.


On Fri, Jul 29, 2016 at 2:55 AM, Guyle Taber <guyle@gmtech.net> wrote:

> Thanks. Yeah we're actually capturing JSON POST data in the Apache logs
> (not GET data), so at this point there is no hostname in the payload so,
> we'd have to figure out a way to derive that by virtual host.
>
> On Jul 28, 2016, at 5:33 PM, Ahmed Vila <ahmed.vila@symphony.is> wrote:
>
> I didn't quite got it - are you ingesting apache access log or what ?
>
> Either way, there is regex_extractor interceptor that you can configure to
> extract hostname into the variable of your choice (f.e. %
> ApacheVirtualHostname). Of course, your event payload has to contain
> vhost fqdn.
> https://flume.apache.org/FlumeUserGuide.html#regex-extractor-interceptor
>
> Then, you can use that variable in the HdfsSink like you described
>
>
> On Fri, Jul 29, 2016 at 12:57 AM, Guyle M. Taber <guyle@gmtech.net> wrote:
>
>> I’m trying to determine if I can use a substitution variable in the hdfs
>> file path that is derived from the apache virtual host name that is called
>> on a web server listening as multiple vhost names. Where is the
>> substitution variable %host deriving that value and is there another var I
>> can use? Or can I use an interceptor to somehow extract the apache virtual
>> hostname called?
>>
>> For instance, a single web server is hosting 3 virtual hosts.
>>
>> vhost1.example.com
>> vhost2.example.com
>> vhost3.example.com
>>
>> Can a single sink hdfs path be customized based on the vhost (not the
>> server’s system hostname) called?
>>
>> Something like   "/hdfs/logdata/%ApacheVirtualHostname"
>
>
>
>
> --
>
>
> Best regards,
>
>
> Ahmed Vila | Senior software engineer
>
> Mobile | +387 62 139 348
>
> Web | www.symphony.is
>
> Skype | wylla_av
>
> San Francisco | Sarajevo | Belgrade
>
> No one can whistle a symphony
>
>


-- 


Best regards,


Ahmed Vila | Senior software engineer

Mobile | +387 62 139 348

Web | www.symphony.is

Skype | wylla_av

San Francisco | Sarajevo | Belgrade

No one can whistle a symphony

Mime
View raw message