chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines
Date Thu, 25 Apr 2013 04:33:12 GMT
MAX_READ_SIZE is a policy, and as long as it is configurable to use
adaptive MAX SIZE or fixed limit.  I think the new change will be better
for some use cases.

regards,
Eric


On Tue, Apr 23, 2013 at 9:49 PM, Luangsay Sourygna <luangsay@gmail.com>wrote:

> Sure, we can statically increase maxReadSize in the configuration. But the
> fact is that we should handle two different situations:
> - when a file is growing rapidly and we want quick response for the other
> files: this mean we don't want a too big maxReadSize number (I guess this
> was the inital idea for this parameter).
> - when a line in a file is much bigger than the other lines and its size
> can be superior to the initial maxReadSize value: this means we would like
> a very high maxReadSize parameter.
>
> Since maxReadSize can't be small and high at the same time, I propose a
> "dynamic" value for this parameter.
> Usually, this parameter should be small (128 kB for instance) and when an
> very big line appears (when we have bufferRead == MAX_READ_SIZE AND
> bytesUsed == 0), we should temporarly increase its value. Then, when the
> big line is sent, get back to the initial value.
>
> Makes sense?
>
> Regards,
>
> Sourygna
>
>
>
> On Mon, Apr 22, 2013 at 6:25 AM, Eric Yang <eric818@gmail.com> wrote:
>
> > maxReadSize can be increased in the configuration.  If using larger
> > maxReadSize is preferred, we can update the default to be larger size.
> >
> > regards,
> > Eric
> >
> > On Sun, Apr 21, 2013 at 3:07 PM, Luangsay Sourygna <luangsay@gmail.com
> > >wrote:
> >
> > > As I said before, I don't think Chukwa should handle those situations
> > since
> > > I think this is a "log rotation" problem.
> > > Personally, I have never seen such problem (log4j RFA for instance has
> a
> > > kind of "flexible" size and every rotated file ended with a \n).
> > >
> > > On the other side, there is a special situation I think Chukwa should
> > take
> > > care of.
> > > Default value for configuration
> > > "chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that
> > if
> > > a line/record is bigger than that size, the record won't be sent by the
> > > agent.
> > > We'll get a warning in the Chukwa's log, but the record will be lost
> (see
> > > LWFTAdaptor.slurp() method).
> > > In such case, would it be possible to temporally increase MAX_READ_SIZE
> > so
> > > that we are able to send
> > > one record on the wire?
> > >
> > > Regards,
> > >
> > > Sourygna
> > >
> > >
> > >
> > >
> > > On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <eric818@gmail.com> wrote:
> > >
> > > > Do we need to consider rotation base on size?  For example the last
> > line
> > > of
> > > > the log file that reaches 300MB.  There is no line break in the first
> > > file,
> > > > but the entry continue to the next rotated log then have a line feed
> > > > delimiter.  If we are splitting line base on \n, then we can
> > reconstruct
> > > > the full line between two files. I am not sure if this case need to
> be
> > > > supported?
> > > >
> > > > regards,
> > > > Eric
> > > >
> > > >
> > > > On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna <
> > luangsay@gmail.com
> > > > >wrote:
> > > >
> > > > > Well, log4j socket adaptor may be great if you control the software
> > > that
> > > > > generates logs.
> > > > > That is not usually my case: customers don't really like having to
> > > > install
> > > > > a Chukwa agents
> > > > > on their production servers so I don't want to think about telling
> > them
> > > > to
> > > > > change the log system
> > > > > of their software.
> > > > >
> > > > > As for partial line when log files rotate, I don't think this is
> > > > something
> > > > > Chukwa should manage (what
> > > > > is more: how could Chukwa be aware there is a problem?).
> > > > > To my view, this would be an error of the "logrotate" system. As
> far
> > > as I
> > > > > know, RFA and DRFA log4j
> > > > > appenders handle quite well the rotation.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Sourygna
> > > > >
> > > > >
> > > > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <eric818@gmail.com>
> > wrote:
> > > > >
> > > > > > I think the best solution is to use Log4j socket appender and
> > Chukwa
> > > > > log4j
> > > > > > socket adaptor to get the full entry of the log without worry
> about
> > > > line
> > > > > > feed.  However, this solution only works with program that is
> > written
> > > > in
> > > > > > Java, and does not keep a copy of existing log file on disk.
> > > > > >
> > > > > > I think your proposal is a good idea to solve tailing text file
> and
> > > > only
> > > > > > line delimited entry will be send.  How do we handle partial
line
> > and
> > > > log
> > > > > > file has rotated?
> > > > > >
> > > > > > regards,
> > > > > > Eric
> > > > > >
> > > > > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna <
> > > > luangsay@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > FileTailingAdaptor is great to tail log files and send
them to
> > > > Hadoop.
> > > > > > > However, last line of the chunk is usually cut which leads
to
> > some
> > > > > > errors.
> > > > > > >
> > > > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve
such
> > > > > problem.
> > > > > > > Nonetheless, this adaptor calls the MapProcessor.process()
> method
> > > for
> > > > > > every
> > > > > > > line in each chunk, thus slowing a lot the Demux phase.
> > > > > > >
> > > > > > > I suggest creating a new adaptor that would mix the benefits
of
> > the
> > > > two
> > > > > > > adaptors: the (Demux) speed of FileTailingAdaptor and
> > > > > > > the preservation of lines from CharFileTailingAdaptorUTF8.
> > > > > > >
> > > > > > > The implementation of the extractRecords() would be:
> > > > > > > - "for loop" on the buffer, starting from the end of the
buffer
> > and
> > > > > going
> > > > > > > backward
> > > > > > > - if we find a separator, save the offset and exit the
loop
> > > > > > > - rest of method would be similar to
> CharFileTailingAdaptorUTF8.
> > > > > > >
> > > > > > > Could you guys please tell me what do you think about it?
> > > > > > > How do you currently manage the "lines cut" with Chukwa?
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Sourygna
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message