flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Simon <s...@cloudera.com>
Subject Re: Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured
Date Wed, 06 Jul 2016 08:05:53 GMT
Hi tinawenqiao,

Thanks for moving this conversation from github to flume dev list. I
believe this is the best place to discuss development efforts. As
mentioned we generally don't accept pull request so please create a
jira(s) (based on how many different issues you would like to address)
and attach your patch to it/them as it is described on the
https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute in

Regarding to your proposed changes:
1) Sounds like a good improvement for TailDirSource (please check
Spooling Directory Source how similar is supported using deserializer:
Your logic can be a part of a new deserializer.
2) Sounds like a good improvement for TailDirSource but would be good
to avoid inventing a new pattern syntax (on a side note please check
out the latest development on SpoolingDirSource as it is now capable
of checking a directory subtree recursively it might have already what
you want to achieve)
3) Bugfix sounds awesome
4) Is something looks very specific to your use case. I believe it
could be a little bit more generalised and or driven by configuration


Attila Simon
Software Engineer
Email:   sati@cloudera.com

On Wed, Jul 6, 2016 at 5:42 AM, 黄鹏程 <gnuhpc@foxmail.com> wrote:
> Fantastic Features! Support for this pull!
> ------------------ 原始邮件 ------------------
> 发件人: "文乔";<315524513@qq.com>;
> 发送时间: 2016年7月6日(星期三) 中午11:38
> 收件人: "dev"<dev@flume.apache.org>;
> 主题: Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And
make the buffersize be configured
> Hi,all:
>    I submit a pull request to flume-1.7 on github. The address is https://github.com/apache/flume/pull/54
>    The changes are as follows:
>    1.  Support multiline. Users can define the start regex of multiline.
>         Add a parameter REGEX_START in TaildirSourceConfigurationConstants.java.REGEX_START
is used for generating Flume events containing multiple lines in the body, per event. The
parameter determines the start of an event. Default value is "". If the value is set to "",
a line with the end of '\n' will be dealed into one flume event.
>         The sample usage:
>         agent.sources.taildirsource.lineStartRegex =  \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
>    2.   Support recursive directory. Wildcards are allowed in the directory name.
>          Modify the function getMatchFiles() in ReliableTaildirEventReader.java to support
this functionality.
>          The sample usage:
>          agent.sources.taildirsource.filegroups.f1 = /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log
>    3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the buffer size be
>          Add a parameter BUFFER_SIZE in TaildirSourceConfigurationConstants.java.BUFFER_SIZE
is used to define the max number of bytes for one flume event body's content. Default size
is 8192.
>     4.  Put the filePath, hostname, IP into the headers of a flume event if the headers
do not contain the keys.

View raw message