Return-Path: X-Original-To: apmail-incubator-chukwa-dev-archive@www.apache.org Delivered-To: apmail-incubator-chukwa-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91304F858 for ; Fri, 19 Apr 2013 19:02:25 +0000 (UTC) Received: (qmail 10562 invoked by uid 500); 19 Apr 2013 19:02:25 -0000 Delivered-To: apmail-incubator-chukwa-dev-archive@incubator.apache.org Received: (qmail 10443 invoked by uid 500); 19 Apr 2013 19:02:25 -0000 Mailing-List: contact chukwa-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-dev@incubator.apache.org Delivered-To: mailing list chukwa-dev@incubator.apache.org Received: (qmail 10432 invoked by uid 99); 19 Apr 2013 19:02:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Apr 2013 19:02:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of luangsay@gmail.com designates 209.85.214.178 as permitted sender) Received: from [209.85.214.178] (HELO mail-ob0-f178.google.com) (209.85.214.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Apr 2013 19:02:18 +0000 Received: by mail-ob0-f178.google.com with SMTP id ni5so3839014obc.9 for ; Fri, 19 Apr 2013 12:01:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=fzsp7EW+F3FsjtKJbuq/7S2l0bLLnXPgeAiiHEYGlfc=; b=vtLURvkC9AkAB09S+VuEc53EBjEN7iUXAgzJhlNomFJMWOCZFn+rHQRDcfj9UPP2kt Z/zy2ZZZAWfuDJPGKUWpLgE8HEFVjhW6cB9Qp7Mp6FB6Ykkl3YDdLdLE+FIr4jj9CcXI x54fpSSjl+VuT1KuxuO85ILOcLAn+jKemx5zok6O3maDnVNtCz6niLLhnmDjgY7dwiG6 YLQtlS4w1xW2FaKKyEQrl64Z3/lY4PWOqWL5NBu6RejPpoRHU2RRBdufVV2K1KgwU3jE SN8FD2B6VVMlnfL/TcL1H2mOpFMqEzdvagUGcvq6Txas0YfqEWJr6nK/DdmIW+LjArzE K5fw== MIME-Version: 1.0 X-Received: by 10.60.135.103 with SMTP id pr7mr4482578oeb.142.1366398117421; Fri, 19 Apr 2013 12:01:57 -0700 (PDT) Received: by 10.76.2.196 with HTTP; Fri, 19 Apr 2013 12:01:56 -0700 (PDT) In-Reply-To: References: Date: Fri, 19 Apr 2013 21:01:56 +0200 Message-ID: Subject: Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines From: Luangsay Sourygna To: chukwa-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=047d7b417c7d6927ea04dabb5a95 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b417c7d6927ea04dabb5a95 Content-Type: text/plain; charset=ISO-8859-1 Well, log4j socket adaptor may be great if you control the software that generates logs. That is not usually my case: customers don't really like having to install a Chukwa agents on their production servers so I don't want to think about telling them to change the log system of their software. As for partial line when log files rotate, I don't think this is something Chukwa should manage (what is more: how could Chukwa be aware there is a problem?). To my view, this would be an error of the "logrotate" system. As far as I know, RFA and DRFA log4j appenders handle quite well the rotation. Regards, Sourygna On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang wrote: > I think the best solution is to use Log4j socket appender and Chukwa log4j > socket adaptor to get the full entry of the log without worry about line > feed. However, this solution only works with program that is written in > Java, and does not keep a copy of existing log file on disk. > > I think your proposal is a good idea to solve tailing text file and only > line delimited entry will be send. How do we handle partial line and log > file has rotated? > > regards, > Eric > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna >wrote: > > > Hi all, > > > > FileTailingAdaptor is great to tail log files and send them to Hadoop. > > However, last line of the chunk is usually cut which leads to some > errors. > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such problem. > > Nonetheless, this adaptor calls the MapProcessor.process() method for > every > > line in each chunk, thus slowing a lot the Demux phase. > > > > I suggest creating a new adaptor that would mix the benefits of the two > > adaptors: the (Demux) speed of FileTailingAdaptor and > > the preservation of lines from CharFileTailingAdaptorUTF8. > > > > The implementation of the extractRecords() would be: > > - "for loop" on the buffer, starting from the end of the buffer and going > > backward > > - if we find a separator, save the offset and exit the loop > > - rest of method would be similar to CharFileTailingAdaptorUTF8. > > > > Could you guys please tell me what do you think about it? > > How do you currently manage the "lines cut" with Chukwa? > > > > Regards, > > > > Sourygna > > > --047d7b417c7d6927ea04dabb5a95--