Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9DFAED6DD for ; Tue, 18 Dec 2012 16:18:01 +0000 (UTC) Received: (qmail 26083 invoked by uid 500); 18 Dec 2012 16:18:00 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 25883 invoked by uid 500); 18 Dec 2012 16:18:00 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 25863 invoked by uid 99); 18 Dec 2012 16:17:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2012 16:17:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brock@cloudera.com designates 209.85.216.46 as permitted sender) Received: from [209.85.216.46] (HELO mail-qa0-f46.google.com) (209.85.216.46) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2012 16:17:54 +0000 Received: by mail-qa0-f46.google.com with SMTP id r4so3498355qaq.19 for ; Tue, 18 Dec 2012 08:17:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=P94TTfFPChKrZSb/cBIwPdemz3a2d5QvmJnZ1fFkDGc=; b=LHLAWtgikXPqjda38RGFuHdWLVp7dSWT3Xz73sTNVDitP5VcPgFunHVA73HS74543I qQ9WA54jJRDzxuB5/TT7mqp+IYiSMDXyilAdjP6VzbKqtPi9k0yPRA6AFrYcxeU3Ake0 mgXlP//fUYWv5AZO9uRrMqo1rkXBeWJmcNRebLT7n1dZtyS56wKL39Ya7jyF9+FP+yvj 8h+dbrzsIB1aZU48/CPSMAeHXX49qggKWOp2z/VnFcIebNSTWpuv6rORi3s/6C91tf5a qwVnOqXQXDqb0MYaLHbpz3hKza+D1mPwxO/OSKyPXwZXk+G/AzPC4OPVhT7C/nPhBG1Y DkTA== Received: by 10.224.203.3 with SMTP id fg3mr1137430qab.22.1355847454243; Tue, 18 Dec 2012 08:17:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.49.83.200 with HTTP; Tue, 18 Dec 2012 08:17:13 -0800 (PST) In-Reply-To: <7475E65732997042ABDCE90B7B4EB2860641C7@OZWEX0201N2.msad.ms.com> References: <7475E65732997042ABDCE90B7B4EB2860641C7@OZWEX0201N2.msad.ms.com> From: Brock Noland Date: Tue, 18 Dec 2012 10:17:13 -0600 Message-ID: Subject: Re: Flume 1.3.0 - NFS + File Channel Performance To: "user@flume.apache.org" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQk1C9+tHpqnNpCMEH3cffGDpNOxMC5uLo9+SvlZsi4yAYLhBJX4CHEc21PAOWueGrfEkgL/ X-Virus-Checked: Checked by ClamAV on apache.org Hi, Hmm, yes in general performance is not going to be great over NFS, but there haven't been any FC changes that stick out here. Could you take 10 thread dumps of the agent running the file channel and 10 thread dumps of the agent sending data to the agent with the file channel? (You can address them to myself directly since the list won't take attachements.) Are there any patterns, like it works for 40 seconds then times out and then works for 39 seconds, etc? Brock On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf wrote: > Hi, > > > > We=92ve run into a strange problem regarding NFS and File Channel perform= ance > while evaluating the new version of Flume. > > We had no issues with the previous version (1.2.0). > > > > Our configuration looks like this: > > =B7 Node1: > (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> A= vro > Sink (-> Node 2) > > =B7 Node2: > (Node1s ->) Avro Source -> File Channel -> Custom Sink > > > > Both the checkpoint and the data directories of the File Channels are on = NFS > shares. We use the same share for checkpoint and data directories, but > different shares for each Node. Unfortunately it is not an option for us = to > use local directories. > > The events are about 1KB large, and the batch sizes are the following: > > =B7 Avro RPC Clients: 1000 > > =B7 Custom Sources: 2000 > > =B7 Avro Sink: 5000 > > =B7 Custom Sink: 10000 > > > > We are experiencing very slow File Channel performance compared to the > previous version, and high amount of timeouts (almost always) in the Avro > RPC Clients and the Avro Sink. > > Something like this: > > =B7 2012-12-18 15:43:31,828 > [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN > org.apache.flume.sink.AvroSink - Failed to send event batch > org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***, > port: *** }: Failed to send batch > at > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.ja= va:236) > ~[flume-ng-sdk-1.3.0.jar:1.3.0] > *** > at > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > [flume-ng-core-1.3.0.jar:1.3.0] > at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31] > Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { > host: ***, port: *** }: Handshake timed out after 20000ms > at > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.ja= va:280) > ~[flume-ng-sdk-1.3.0.jar:1.3.0] > at > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.ja= va:224) > ~[flume-ng-sdk-1.3.0.jar:1.3.0] > ... 5 common frames omitted > Caused by: java.util.concurrent.TimeoutException: null > at > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228) > ~[na:1.6.0_31] > at java.util.concurrent.FutureTask.get(FutureTask.java:91) > ~[na:1.6.0_31] > at > org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.ja= va:278) > ~[flume-ng-sdk-1.3.0.jar:1.3.0] > ... 6 common frames omitted > > (I had to remove some details, sorry for that.) > > > > We managed to narrow down the root cause of the issue to the File Channel= , > because: > > =B7 Everything works fine if we switch to the Memory Channel or t= o the > Old File Channel (1.2.0). > > =B7 Everything works fine if we use local directories. > > We=92ve tested this on multiple different PCs (both Windows and Linux). > > > > I spent the day debugging and profiling, but I could not find anything wo= rth > mentioning (nothing with excessive CPU usage, no threads are waiting too > much, etc=85). The only problem is that File Channel takes and puts take = way > more time than with the previous version. > > > > > > Could someone please try the File Channel on an NFS share? > > Does anyone have similar issues? > > > > Thank you for your help. > > > > Regards, > > Rudolf > > > > Rudolf Rakos > Morgan Stanley | ISG Technology > Lechner Odon fasor 8 | Floor 06 > Budapest, 1095 > Phone: +36 1 881-4011 > Rudolf.Rakos@morganstanley.com > > > Be carbon conscious. Please consider our environment before printing this > email. > > > > > ________________________________ > > NOTICE: Morgan Stanley is not acting as a municipal advisor and the opini= ons > or views contained herein are not intended to be, and do not constitute, > advice within the meaning of Section 975 of the Dodd-Frank Wall Street > Reform and Consumer Protection Act. If you have received this communicati= on > in error, please destroy all electronic and paper copies and notify the > sender immediately. Mistransmission is not intended to waive confidential= ity > or privilege. Morgan Stanley reserves the right, to the extent permitted > under applicable law, to monitor electronic communications. This message = is > subject to terms available at the following link: > http://www.morganstanley.com/disclaimers If you cannot access these links= , > please notify us by reply message and we will send the contents to you. B= y > messaging with Morgan Stanley you consent to the foregoing. --=20 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit= /