incubator-flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Tran <briantra...@gmail.com>
Subject Re: Metadata parsing
Date Tue, 09 Aug 2011 08:55:02 GMT
I actually wrote an implementation last week. If no one else has already
done it, how do I go about adding it?

On Sat, Aug 6, 2011 at 3:25 AM, Lior Harel <harel.lior@gmail.com> wrote:

> sure, let's do this. I'll join the dev mailing list, and see if i can help
> with the implementation.
>
> On Aug 5, 2011, at 6:34 PM, Jonathan Hsieh wrote:
>
> Lior,
>
> Ah, good point, I mispoke.  Thanks for correcting me!
>
> Unfortunately, you are correct,  flume currently can't do this
> out-of-the-box.
>
> It seems like a reasonable addition and would be gladly accepted patch if
> someone were to implement it.  If you, Brian, or anyone else is  interested
> in building this, let's move discussion about this to the
> flume-dev@incubator.apache.org!
>
> Thanks,
> Jon.
>
> On Fri, Aug 5, 2011 at 1:30 AM, Lior Harel <harel.lior@gmail.com> wrote:
>
>> Hi Jon,
>> I'm interested in the same use case as Brian asked about, I'm not sure I
>> understand your answer, as far as I understand the regex decorator can only
>> extract data out of the event body, while the tailSrcFile attibute is part
>> of the metadata. Can the regex decorator somehow operate on it?
>>
>>
>> Lior
>>
>> On Aug 5, 2011, at 9:35 AM, Jonathan Hsieh wrote:
>>
>> [bcc flume-user@cloudera.org (deprecated), cc
>> flume-user@incubator.apache.org]
>>
>> Brian,
>>
>> The easiest way is to use the regex decorator to create a new attribute
>> and use that attribute as to do output bucketing.
>>
>> http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_extractors
>>
>> Jon.
>>
>> On Mon, Jul 25, 2011 at 5:50 PM, Brian Tran <briantran86@gmail.com>wrote:
>>
>>> I want to do output bucketing based on the tailSrcFile metadata value
>>> set by the tailDir source. However, I only want part of the value for
>>> the destination path in HDFS.
>>>
>>> For example, I have an event with the tailSrcFile value
>>> "unwanted_prefix_category_name-2011-07-25.log" but only want to use
>>> "category_name" for output bucketing.
>>>
>>> What is the easiest way to do this?
>>>
>>
>>
>>
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com
>>
>>
>>
>>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>
>

Mime
View raw message