Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 60620 invoked from network); 18 May 2010 20:57:55 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 May 2010 20:57:55 -0000 Received: (qmail 14959 invoked by uid 500); 18 May 2010 20:57:55 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 14924 invoked by uid 500); 18 May 2010 20:57:55 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 14916 invoked by uid 99); 18 May 2010 20:57:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 20:57:55 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of psdc1978@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 20:57:47 +0000 Received: by fxm16 with SMTP id 16so717789fxm.35 for ; Tue, 18 May 2010 13:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=Cc9ZsvHuNBXFYKq+wbw5WOYVse6d5n3bkwBlobpimXQ=; b=SBSm3OP4JmIK89i89AXxbOEgjOKQzqbSERiygSHw22wsEvZmvctETt2gVGhCWyuHEE Fls2TtkcmysdguEpDacudURLOkhH35LKtnf7o92Es7rVbRQWJoiIdHrn3Wzwo3ngGlgF 4JLEFO8jO+IHD91Zub6Pm270WgeOGW+yPdsrk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ZTQkTzJFoQjBdsRFZ+ZwdZtD3VkYwAX0yC3+zFdzQcmOL/Sfhwa92gqmgNrCW8WMkl iMKKyC5NFvlhwqUeCYqsRd1GA+lr6n7kc9j7lDpZRRh7SYwm6rrfrMB7MVow0Yn1X6XY a5RP73e2LwD68LdbNcjgMP+vEZqfVwu47M8PE= MIME-Version: 1.0 Received: by 10.239.180.201 with SMTP id j9mr770005hbg.164.1274216246544; Tue, 18 May 2010 13:57:26 -0700 (PDT) Received: by 10.239.169.72 with HTTP; Tue, 18 May 2010 13:57:26 -0700 (PDT) In-Reply-To: <810AF638-FD5B-4C74-8A90-763CA19B9001@gmail.com> References: <810AF638-FD5B-4C74-8A90-763CA19B9001@gmail.com> Date: Tue, 18 May 2010 21:57:26 +0100 Message-ID: Subject: Re: Trying to relate a split file to a input file From: psdc1978 To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001485f7d91ebe0a0f0486e49600 X-Virus-Checked: Checked by ClamAV on apache.org --001485f7d91ebe0a0f0486e49600 Content-Type: text/plain; charset=ISO-8859-1 I don't think that the workcount example uses FileSplit class. Only the MultithreadedMapper class uses FileSplit and I can't find an example where it's invoked. Where is the setup() method? On Tue, May 18, 2010 at 6:50 PM, Wilkes, Chris wrote: > In your setup() look at context.getInputSplit(), this will be a FileSplit > in your case. From there you can do a getPath() to see the both the > directory structure and the split value. > > > On May 18, 2010, at 10:01 AM, psdc1978 wrote: > > Hi, >> >> I'm study the MapReduce code, and I've the following questions: >> >> 1 - I'm running the wordcount example. I've 3 txt files as input. Each txt >> file is about 120Mb. >> >> During the execution of the map tasks, a number of map tasks will read the >> txt files. Each file is divided in split files. I would like to know to each >> txt file corresponds a split. >> For example, for the A.txt file, it will be created 2 splits (split0 and >> split1) of 64Mb each. I would like to know that split0 and split1 belongs to >> A.txt. >> Is it possible? If I've to do some code, is there any object that contains >> this data? >> >> 2 - >> The Job task uses a job.split file. What contains this file and what is >> the purpose of this file? >> >> Thanks, >> >> -- >> PSC >> > > -- Pedro --001485f7d91ebe0a0f0486e49600 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I don't think that the workcount example uses FileSplit class. Only the= MultithreadedMapper class uses FileSplit and I can't find an example w= here it's invoked.

Where is the setup() method?



On Tue, May 18, 2010 at 6:50 PM, Wilkes, Chris <cwilkes@gmail.com> wrote:
<= div class=3D"gmail_quote">
In your setup() look at context.getInputSplit(), this will be a FileSplit i= n your case. =A0 From there you can do a getPath() to see the both the dire= ctory structure and the split value.


On May 18, 2010, at 10:01 AM, psdc1978 wrote:

Hi,

I'm study the MapReduce code, and I've the following questions:

1 - I'm running the wordcount example. I've 3 txt files as input. E= ach txt file is about 120Mb.

During the execution of the map tasks, a number of map tasks will read the = txt files. Each file is divided in split files. I would like to know to eac= h txt file corresponds a split.
For example, for the A.txt file, it will be created 2 splits (split0 and sp= lit1) of 64Mb each. I would like to know that split0 and split1 belongs to = A.txt.
Is it possible? If I've to do some code, is there any object that conta= ins this data?

2 -
The Job task uses a job.split file. What contains this file and what is the= purpose of this file?

Thanks,

--
PSC




--
Pedro
--001485f7d91ebe0a0f0486e49600--