Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 332C817B88 for ; Wed, 8 Apr 2015 13:15:00 +0000 (UTC) Received: (qmail 64277 invoked by uid 500); 8 Apr 2015 13:14:53 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 64121 invoked by uid 500); 8 Apr 2015 13:14:53 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 64075 invoked by uid 99); 8 Apr 2015 13:14:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2015 13:14:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shahilvarshney@gmail.com designates 209.85.212.181 as permitted sender) Received: from [209.85.212.181] (HELO mail-wi0-f181.google.com) (209.85.212.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2015 13:14:28 +0000 Received: by wiaa2 with SMTP id a2so58045870wia.0; Wed, 08 Apr 2015 06:14:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=rTqZ3E1I6HcfALdSiernoJj78C84PyCwQlZidHAf0XE=; b=WECtQAX3E/S6GcJ4Oxkj84mVEWJ/PuIV0Udv77M/rVcOw0De+iEItJj1HtVYKmc2Bm xWcZI6EJubWvak5FBFQ3U3Otf1VlYisKtwJ2pLWJlWIhCxvro4Zox1PQdknQLxxE9wPW /mpb61kCHs3vZCB8i6ZynW1Sn9IxIzCwojOhG0HSjh8oON5dGPWS+Jk3V4FGRcWvhjFl 6AppI8401VIrumNRTuy8IPRaBBIMI4BLjvZkPE2f1Gx92kjHsa1UE+kwoqHJLFrMbdn+ IxKeImXy6aQrBenT0gAAcxxIqLfp3lP/60l6t8qHejiGfHZr4yCLZCv2witd08RxJ7AX Z1wQ== MIME-Version: 1.0 X-Received: by 10.180.81.104 with SMTP id z8mr14379612wix.23.1428498866857; Wed, 08 Apr 2015 06:14:26 -0700 (PDT) Received: by 10.194.216.35 with HTTP; Wed, 8 Apr 2015 06:14:26 -0700 (PDT) In-Reply-To: References: Date: Wed, 8 Apr 2015 18:44:26 +0530 Message-ID: Subject: Re: map() function call related From: Shahil Varshney To: hdfs-dev@hadoop.apache.org, cnauroth@hortonworks.com Cc: "mapreduce-dev-owner@hadoop.apache.org" , "mapreduce-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=f46d04426b648569950513364fd8 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04426b648569950513364fd8 Content-Type: text/plain; charset=UTF-8 Sir , Actually i want to want to perform de duplication on input splits . so for this, i have to perform content based chunking (using TTTD algorithm) on each input split and and leave those chunks that are similar with previous chunk and send only new chunks to map. sir please tell me .. in which class should i have to make changes. On Tue, Apr 7, 2015 at 10:43 PM, Chris Nauroth wrote: > Hello Shahil, > > In the current trunk codebase, the relevant files are > hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-co > re/src/main/java/org/apache/hadoop/mapred/MapTask.java and > hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-co > re/src/main/java/org/apache/hadoop/mapreduce/Mapper.java. MapTask manages > the execution of the mapper task, and eventually it calls Mapper#run, > which then calls into the implementation of the map method. BTW, you'll > also see a corresponding ReduceTask.java and Reducer.java in the same > directories if you need to look at those too. > > Input split calculation is performed by a subclass of InputFormat. > > http://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputF > ormat.html > > > I recommend looking at that. You also can navigate down through those > JavaDocs to identify subclasses of InputFormat, like FileInputFormat and > TextInputFormat, which you can then find in the source code. > > I hope this helps. > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > > > > > On 4/7/15, 6:09 AM, "Shahil Varshney" wrote: > > >Sir , > >i want to know that which class in hadoop (internal source class) is > >responsible for calling map function for each key value pair(means calls > >map() function). > > > > and which class actually done the input split job. i want to create my > >own > >class for input split so please tell me . > > --f46d04426b648569950513364fd8--