Return-Path: X-Original-To: apmail-incubator-chukwa-dev-archive@www.apache.org Delivered-To: apmail-incubator-chukwa-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89F4A10114 for ; Sun, 21 Apr 2013 15:06:14 +0000 (UTC) Received: (qmail 40432 invoked by uid 500); 21 Apr 2013 15:06:14 -0000 Delivered-To: apmail-incubator-chukwa-dev-archive@incubator.apache.org Received: (qmail 40400 invoked by uid 500); 21 Apr 2013 15:06:14 -0000 Mailing-List: contact chukwa-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-dev@incubator.apache.org Delivered-To: mailing list chukwa-dev@incubator.apache.org Received: (qmail 40389 invoked by uid 99); 21 Apr 2013 15:06:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 15:06:14 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of luangsay@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 15:06:09 +0000 Received: by mail-ob0-f182.google.com with SMTP id dn14so4787361obc.13 for ; Sun, 21 Apr 2013 08:05:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=oXDZuCbzn1sGgc+RZTuvc1x670MKyTKnIgD1cZWwswM=; b=ekPxVVofyck+O+aPp6/+AA8lsypVxwMAmOuaiL1LtGM1C4q9kRw9s92P+7F0u2Z8Nw 8I/uoX+UsOZGVP3LHcNN/L8d7QpfmboAwIvWGMdWSmcwkBL5r5/vZBKdF4LXFYgdURK2 rEMjhKddhCyCqxiB9+uyOPxRtqB8bf1hyPrzK2i4GY8MR8qEkoHqvkCEe9HvncM74lEp dzRmZim4fOI1djcGRFefhnDTAlFLAtVjuvwduFuz1IResWC9ZZMHglylPJEa39RKNfcD 6HpBSrYfF5jvYLI8IGZsJrsO82IXHQlUnoWwhvbWroe0wAf8/G8l//ZIrfLaQMxlEK99 ScJA== MIME-Version: 1.0 X-Received: by 10.60.56.199 with SMTP id c7mr13663281oeq.8.1366556748275; Sun, 21 Apr 2013 08:05:48 -0700 (PDT) Received: by 10.76.2.196 with HTTP; Sun, 21 Apr 2013 08:05:48 -0700 (PDT) In-Reply-To: References: Date: Sun, 21 Apr 2013 17:05:48 +0200 Message-ID: Subject: Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines From: Luangsay Sourygna To: chukwa-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=089e015368c88be8e204dae049f2 X-Virus-Checked: Checked by ClamAV on apache.org --089e015368c88be8e204dae049f2 Content-Type: text/plain; charset=ISO-8859-1 Here is the Jira I opened: https://issues.apache.org/jira/browse/CHUKWA-686 Writting the Junit tests, I discovered a small "error" in the classes CharFileTailingAdaptorUTF8 and CharFileTailingAdaptorUTF8NewLineEscaped. When we create the chunk, all the buffer is passed to the constructor, meaning that the chunk will get both the usefull data and the useless data: ChunkImpl event = new ChunkImpl(type, toWatch.getAbsolutePath(), buffOffsetInFile + bytesUsed, buf, this); I think we should only pass the usefull part of the data, just like this: ChunkImpl event = new ChunkImpl(type, toWatch.getAbsolutePath(), buffOffsetInFile + bytesUsed, Arrays.copyOf(buf, bytesUsed), this); Although it does not seem a real issue because the method hasNext() of AbstractProcessor.class ensures we only process the usefull part, I see two reasons to fix this: - it makes CharFileTailingAdaptorUTF8 fails some of my unit tests (TestFileTailingAdaptorPreserveLines.testDontBreakLines() for instance) that should not fail for this adaptor. - we send data on the network for nothing. Since the useless part only represents less than a line, it is not usually a big deal: we only transfer a few bytes for nothing. However, a customer of mine has a log file with lines as long as 300 kB (I know, quite strange for a "log file"...) so in that case I think the fix is worth it. Regards, Sourygna On Fri, Apr 19, 2013 at 9:01 PM, Luangsay Sourygna wrote: > Well, log4j socket adaptor may be great if you control the software that > generates logs. > That is not usually my case: customers don't really like having to install > a Chukwa agents > on their production servers so I don't want to think about telling them to > change the log system > of their software. > > As for partial line when log files rotate, I don't think this is something > Chukwa should manage (what > is more: how could Chukwa be aware there is a problem?). > To my view, this would be an error of the "logrotate" system. As far as I > know, RFA and DRFA log4j > appenders handle quite well the rotation. > > Regards, > > Sourygna > > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang wrote: > >> I think the best solution is to use Log4j socket appender and Chukwa log4j >> socket adaptor to get the full entry of the log without worry about line >> feed. However, this solution only works with program that is written in >> Java, and does not keep a copy of existing log file on disk. >> >> I think your proposal is a good idea to solve tailing text file and only >> line delimited entry will be send. How do we handle partial line and log >> file has rotated? >> >> regards, >> Eric >> >> On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna > >wrote: >> >> > Hi all, >> > >> > FileTailingAdaptor is great to tail log files and send them to Hadoop. >> > However, last line of the chunk is usually cut which leads to some >> errors. >> > >> > I know that we can use CharFileTailingAdaptorUTF8 to solve such problem. >> > Nonetheless, this adaptor calls the MapProcessor.process() method for >> every >> > line in each chunk, thus slowing a lot the Demux phase. >> > >> > I suggest creating a new adaptor that would mix the benefits of the two >> > adaptors: the (Demux) speed of FileTailingAdaptor and >> > the preservation of lines from CharFileTailingAdaptorUTF8. >> > >> > The implementation of the extractRecords() would be: >> > - "for loop" on the buffer, starting from the end of the buffer and >> going >> > backward >> > - if we find a separator, save the offset and exit the loop >> > - rest of method would be similar to CharFileTailingAdaptorUTF8. >> > >> > Could you guys please tell me what do you think about it? >> > How do you currently manage the "lines cut" with Chukwa? >> > >> > Regards, >> > >> > Sourygna >> > >> > > --089e015368c88be8e204dae049f2--