Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 68617 invoked from network); 31 Jul 2010 20:10:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 31 Jul 2010 20:10:58 -0000 Received: (qmail 47032 invoked by uid 500); 31 Jul 2010 20:10:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 46957 invoked by uid 500); 31 Jul 2010 20:10:57 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 46949 invoked by uid 99); 31 Jul 2010 20:10:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Jul 2010 20:10:57 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brightdl@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-px0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Jul 2010 20:10:49 +0000 Received: by pxi11 with SMTP id 11so1711797pxi.35 for ; Sat, 31 Jul 2010 13:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:content-type :content-transfer-encoding:subject:date:message-id:to:mime-version :x-mailer; bh=uJoan6nrAUUS7pu6ESCvUgp74kPUyqJxdWxBiQ/uONo=; b=j0kllBh/y2iQ2XKsolFfEzsZkU9UPu79PHM50+nrGPegqRaoJQM53/eVReAI9USmok svLuUaykTNiToodyNwPzd0KFkFJuoPY9NaNHY7MTmuqShoeNwb2OxvigBla6Zaa8TONc ZyO0We5jC7caoDUIyUUtzGmTk8Df32W6Dcc0o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; b=OkA5/Tmt7sSwJicqvUn6aEhXAf7crqLsTFcl5wL3ZIYoP3O0kzIVQ/0TJUv1KMLESW 4es5Ms6EQPP2L6WcpYnvGrW/HCKy/AenDsjwQcZ2SVDZoHhzGNJCT8ankhCSzwD7hhTv wVQR4jvU9ldDgNYxLBBVrDfFvce5S/8RVaZAI= Received: by 10.142.192.4 with SMTP id p4mr3257751wff.311.1280607029505; Sat, 31 Jul 2010 13:10:29 -0700 (PDT) Received: from [192.168.2.101] (cm131.delta29.maxonline.com.sg [59.189.29.131]) by mx.google.com with ESMTPS id y16sm4768275wff.14.2010.07.31.13.10.28 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 31 Jul 2010 13:10:29 -0700 (PDT) From: Bright D L Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: mapreduce for proxy log file analysis Date: Sun, 1 Aug 2010 04:10:25 +0800 Message-Id: <6B5AFE19-52AE-41F9-A0D6-6BED2C5556B0@gmail.com> To: mapreduce-user@hadoop.apache.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) Hi all, I am doing a simple project to analyze http proxy server logs by = hadoop mapreduce approach (in Java). The log file contains logs for a = week or some times more than that. =20 I have following requirements: 1) Find the top 50 bandwidth consumers (IPs) for each = day 2) Find the hour of the day where there is maximum = bandwidth utilization Please help me out with some directions. Sample code is highly = appreciated. Thank you all, Bright=