Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 6242 invoked from network); 5 Oct 2008 17:13:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Oct 2008 17:13:16 -0000 Received: (qmail 67837 invoked by uid 500); 5 Oct 2008 17:13:10 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 67791 invoked by uid 500); 5 Oct 2008 17:13:09 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 67780 invoked by uid 99); 5 Oct 2008 17:13:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2008 10:13:09 -0700 X-ASF-Spam-Status: No, hits=2.1 required=10.0 tests=DNS_FROM_SECURITYSAGE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of uslatha@gmail.com designates 209.85.142.187 as permitted sender) Received: from [209.85.142.187] (HELO ti-out-0910.google.com) (209.85.142.187) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2008 17:12:07 +0000 Received: by ti-out-0910.google.com with SMTP id d27so2913891tid.9 for ; Sun, 05 Oct 2008 10:12:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type; bh=YRdilsuWlFXt/TIaVpmohHnXx6SA588H8qXCunnuR1k=; b=gKzhvoxRor4G/jb5/R8Bm3EvBeb0kjB4fY165Rh+6xWaTsQ7iXzjrcTKRStCxggwvX 4fWdItFw9TtYFAgt9c8h0CkNkgIk5uKwe+VYII5rxCpaauo48qNRnCat8+ZPdSGZBhiO nQChiI+xKAC8sdMM/9GcWfMHfoLP7a/EMnOcY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type; b=i4qdbeohznySK5VxH4KrvvA4T1RRoKUGuPAmI8KMU3AhLi2PLZkUulOsQbyD+57Pws i9BDPS65THzjKJb06yaQsz+HzAWKacB7NakTkOG9uLU/n1/RVWDS+a0qtkOWME6a66Dh B3m1l98iWMm/p2rdzi9gyKHXvAuKq8eUN7wqE= Received: by 10.110.37.3 with SMTP id k3mr4677006tik.36.1223226762401; Sun, 05 Oct 2008 10:12:42 -0700 (PDT) Received: by 10.110.47.16 with HTTP; Sun, 5 Oct 2008 10:12:42 -0700 (PDT) Message-ID: <72be46b0810051012n4ff2cf09h74031f5fec05498d@mail.gmail.com> Date: Sun, 5 Oct 2008 22:42:42 +0530 From: Latha To: core-user@hadoop.apache.org Subject: How to modify hadoop-wordcount example to display File-wise results. MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_56900_16388223.1223226762383" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_56900_16388223.1223226762383 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Greetings! Hi, Am trying to modify the WordCount.java mentioned at Example: WordCount v1.0at http://hadoop.apache.org/core/docs/current/mapred_tutorial.html Would like to have output the following way, FileOne word1 itsCount FileOne word2 itsCount ..(and so on) FileTwo word1 itsCount FileTwo wordx its Count .. FileThree word1 its Count .. Am trying to do following changes to the code of WordCount.java 1) private Text filename = new Text(); // Added this to Map class .Not sure if I would have access to filename here. 2) (line 18)OutputCollector output // Changed the argument in the map() function to have another Text field. 3) (line 23) output.collect(filename, word , one); // Trying to change the output format as 'filename word count' Am not sure what other changes are to be affected to achieve the required output. filename is not available to the map method. My requirement is to go through all the data available in hdfs and prepare an index file with < filename word count> format. Could you please throw light on how I can achieve this. Thankyou Srilatha ------=_Part_56900_16388223.1223226762383--