Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 805E01044D for ; Tue, 30 Apr 2013 18:57:23 +0000 (UTC) Received: (qmail 46973 invoked by uid 500); 30 Apr 2013 18:57:18 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 46645 invoked by uid 500); 30 Apr 2013 18:57:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46637 invoked by uid 99); 30 Apr 2013 18:57:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 18:57:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dontariq@gmail.com designates 209.85.220.182 as permitted sender) Received: from [209.85.220.182] (HELO mail-vc0-f182.google.com) (209.85.220.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 18:57:14 +0000 Received: by mail-vc0-f182.google.com with SMTP id ht10so735302vcb.27 for ; Tue, 30 Apr 2013 11:56:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=W+SY7Wo2hY70lnwYuMBGJCnLmAOmlDS3X6ta560qcv8=; b=04nx2lbzt8JaVCPgsyJzTg96Pw9rjNJfyy3BlKDCssoyMtLbFSeJCMZFsQVmWKVjDX TCNu0Q0lXzrAtI/6d1W6H4/klg/Gj2QWLFi9eOkDujvzKcEuPjHiCTnWCzbMhr5FU2nA xEyHSBX/SDWN0Amhk0599BcFIQ46RIAZTfKS5JyJIcm9euiub24SL2I35g0y3oD3eEmm oXP9ukaa5CKH8Pe8xkQAoZabJkyBGlYRPZGIJMjziV7ZIlivMb3qA5cc+5NDgp/Kx49q GfU9/+4CJwqd+IOGVbRKE7svrS59JMYzlzVDmPY/pHyGuPIRrpAmtuykuya/JmRiZJSm 6B2w== X-Received: by 10.52.33.236 with SMTP id u12mr29614095vdi.101.1367348213584; Tue, 30 Apr 2013 11:56:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.220.194 with HTTP; Tue, 30 Apr 2013 11:56:13 -0700 (PDT) In-Reply-To: References: From: Mohammad Tariq Date: Wed, 1 May 2013 00:26:13 +0530 Message-ID: Subject: Re: partition as block? To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=20cf307ca3e68e1a0404db98901b X-Virus-Checked: Checked by ClamAV on apache.org --20cf307ca3e68e1a0404db98901b Content-Type: text/plain; charset=ISO-8859-1 Hello Jay, What are you going to do in your custom InputFormat and partitioner?Is your InputFormat is going to create larger splits which will overlap with larger blocks?If that is the case, IMHO, then you are going to reduce the no. of mappers thus reducing the parallelism. Also, much larger block size will put extra overhead when it comes to disk I/O. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, May 1, 2013 at 12:16 AM, Jay Vyas wrote: > Hi guys: > > Im wondering - if I'm running mapreduce jobs on a cluster with large block > sizes - can i increase performance with either: > > 1) A custom FileInputFormat > > 2) A custom partitioner > > 3) -DnumReducers > > Clearly, (3) will be an issue due to the fact that it might overload tasks > and network traffic... but maybe (1) or (2) will be a precise way to "use" > partitions as a "poor mans" block. > > Just a thought - not sure if anyone has tried (1) or (2) before in order > to simulate blocks and increase locality by utilizing the partition API. > > -- > Jay Vyas > http://jayunit100.blogspot.com > --20cf307ca3e68e1a0404db98901b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello Jay,

=A0 =A0 What are you g= oing to do in your custom InputFormat and partitioner?Is your InputFormat i= s going to create larger splits which will overlap with larger blocks?If th= at is the case, IMHO, then you are going to reduce the no. of mappers thus = reducing the parallelism. Also, much larger block size will put extra overh= ead when it comes to disk I/O.



On Wed, May 1, 2013 at 12:16 AM, Jay Vya= s <jayunit100@gmail.com> wrote:
Hi guys:

Im wondering - if I'm r= unning mapreduce jobs on a cluster with large block sizes - can i increase = performance with either:

1) A custom FileInputFormat

2) A custom partitioner

3) -DnumReducers

Clearly, (3) will be an issue due to the fa= ct that it might overload tasks and network traffic... but maybe (1) or (2)= will be a precise way to "use" partitions as a "poor mans&q= uot; block.=A0

Just a thought - not sure if anyone has tried (1) or (2) bef= ore in order to simulate blocks and increase locality by utilizing the part= ition API.


--20cf307ca3e68e1a0404db98901b--