Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 045D0E126 for ; Thu, 17 Jan 2013 16:27:05 +0000 (UTC) Received: (qmail 75891 invoked by uid 500); 17 Jan 2013 16:27:04 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 75854 invoked by uid 500); 17 Jan 2013 16:27:04 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 75845 invoked by uid 99); 17 Jan 2013 16:27:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 16:27:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of otto@wikimedia.org designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vb0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 16:26:57 +0000 Received: by mail-vb0-f44.google.com with SMTP id fc26so2734487vbb.3 for ; Thu, 17 Jan 2013 08:26:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=xCWFUHiGEVcFo/UY1DulUkp+cMh8E16zfkYwMj4F/94=; b=Ur5wJ/OxucqDRD4ws1dF6kMqzuFEJQoBmyhEdsOQR3MFVZ+QdhecDFDE+E5JhUcf6a SDA2pBAIwSSQm9OjH/JlyZX6/DRtP9NvqGMYg8y6MF5gpNqeexQhgTnuxVjNbL2fmTo4 vuM3x0PFoy3j5HNi1B/KetFqWM3ljekYQnmW0qM8Idh/hi/DHeTz/694lRB5RW37TsXx NC5seUXitjdT0/1nS8S2BXaXvAOj0Gr+BfP5kDnwr1mH4XlGbbycbvC8wv7doTzYZa/I 3/P2nkJN2yIo2X2bBQnRqeWnE1CmWnV4RgsrLGkauZtK0WWQpitA1J7w84D0oL/crR9t thyA== X-Received: by 10.52.28.176 with SMTP id c16mr5187984vdh.126.1358439996322; Thu, 17 Jan 2013 08:26:36 -0800 (PST) Received: from [192.168.1.126] (ool-4575bf81.dyn.optonline.net. [69.117.191.129]) by mx.google.com with ESMTPS id n4sm945252vdf.10.2013.01.17.08.26.35 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 17 Jan 2013 08:26:35 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Need for UDP / Multicast Source From: Andrew Otto In-Reply-To: <15D9FF84-2273-49C9-BCBD-7C6C747153D6@wikimedia.org> Date: Thu, 17 Jan 2013 11:26:34 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <098E9C31-4846-4AED-B675-7B87AEABE07C@wikimedia.org> References: <5654382F-6A12-433F-B375-212663D14CD3@wikimedia.org> <30D5941F8B9A4248BEEB50C06687A957@cloudera.com> <0598ACC3-FE31-4F72-9AF7-B9B79C16FDF6@gmail.com> <5967102B-1710-4472-8A8B-9BA244E92B33@wikimedia.org> <9BAF63F3-1CFF-46EA-A420-301E1CCEFBA2@wikimedia.org> <029748F1-8B14-4F6E-B68E-17671E8BC4E7@wikimedia.org> <15D9FF84-2273-49C9-BCBD-7C6C747153D6@wikimedia.org> To: user@flume.apache.org X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQnjpvZwg34aMn3pK8xMSoUstXC1WcZt4p8j7sPiQL7bFKNBxhvy4E/g9gjK4dl8F2PHhz2E X-Virus-Checked: Checked by ClamAV on apache.org Ok, I'm still struggling with this a bit. Here's what I've currently = got going. In order to make it easier to check what I am and am not receiving, I've = narrowed the logs that I store in HDFS down to those originating from a = single host (cp1044.wikimedia.org). Each host generates contiguous = sequence numbers for each log line. I can use the sequence number to = make sure I'm not missing lines from a host. On another nearby node, I started a process to store all of the log = lines originating from this cp1044. I then started the Flume agent and = waited a 3 minutes for it to roll files 3 times. I currently have 4 = HDFS sinks going, so this created a total of 12 files. I got the files = out of HDFS, and then sorted on their sequence numbers to gain the first = and last sequence number in this set of files. =20 I took those two border sequence numbers and extracted all of the log = lines generated by cp1044 on the nearby host (not using Flume). I = should be able to compare the number of lines here with the number of = lines in the 12 files I extracted from HDFS and Flume. If they are the = same, then Flume and UDPSource is working! Flume saved 19451 events to HDFS, and the number of raw events recorded = outside of Flume and HDFS was 78176. I'm up to about 25% of data! = Better but still not good enough. :( This was for about 3 minutes of data, so for a single host, this = shouldn't be more than 500 events per second. I must be doing something = really wrong on the Flume tweaky side of things, eh? Any more ideas? Thanks! P.S. YOU GUYS ARE SO HELPFUL. Thanks so much for everything thus far. On Jan 17, 2013, at 10:34 AM, Andrew Otto wrote: >> with UDP there's no guaranty that the data will reach destination. >=20 > True, but I'm experimenting with using Flume as a replacement for a = system that is already in place. I actually got the numbers I listed = below by grabbing data directly off of the UDP stream and saving them to = a file on local disk. Its possible that UDP data is getting lost in the = network somewhere, but if that were the case I wouldn't know about it. = I am comparing Flume's performance to a single process writing to a = local disk. >=20 >=20