Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A668610606 for ; Fri, 30 Aug 2013 07:10:23 +0000 (UTC) Received: (qmail 96833 invoked by uid 500); 30 Aug 2013 07:10:15 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 96746 invoked by uid 500); 30 Aug 2013 07:10:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 96737 invoked by uid 99); 30 Aug 2013 07:10:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 07:10:11 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shekhar2581@gmail.com designates 209.85.212.41 as permitted sender) Received: from [209.85.212.41] (HELO mail-vb0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 07:10:07 +0000 Received: by mail-vb0-f41.google.com with SMTP id g17so1087055vbg.28 for ; Fri, 30 Aug 2013 00:09:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=F2E/AwWAmtbgXzuhVg6uljhoWuvHLXTLYHX8014TzJ0=; b=rIO/uTTSLkjWUmCKjoPSz47awTkf+BD7ZoDJHth9NbQynJ4CVvakHkyjGNmOtcoycs svczN3+MlfQ3i0s7/6nkDazJUlbu3TuWAsmURDRBbWRXT43AiSp2vi7VTJlIVoDoTZph bPlUMG/d2RWlSm7emehW/kZk5Pqled1ngYzSc/ZMW6eZXmj8rZS+frgKm4Wm4Bomia7H lpLBaQm7BqqIYDlmOLaX0PTuZNIF2oVk198Gkzi1pHFdIqF60hTIouNGzHSD3vJBcIuG ev6sHMBOS3iUxV5Kly6imGWCNWsi+HkWYZuqjEjDEF9aAtKPJPr1FjE5q68IzwOX/g3i I7Aw== MIME-Version: 1.0 X-Received: by 10.52.34.10 with SMTP id v10mr4442981vdi.28.1377846586441; Fri, 30 Aug 2013 00:09:46 -0700 (PDT) Received: by 10.220.168.72 with HTTP; Fri, 30 Aug 2013 00:09:46 -0700 (PDT) In-Reply-To: References: <6CC784D5-D3E0-45BC-916C-D9865AA4F27B@cloudera.com> Date: Fri, 30 Aug 2013 12:39:46 +0530 Message-ID: Subject: Re: secondary sort - number of reducers From: Shekhar Sharma To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Is the hash code of that key is negative.? Do something like this return groupKey.hashCode() & Integer.MAX_VALUE % numParts; Regards, Som Shekhar Sharma +91-8197243810 On Fri, Aug 30, 2013 at 6:25 AM, Adeel Qureshi wrote: > okay so when i specify the number of reducers e.g. in my example i m using 4 > (for a much smaller data set) it works if I use a single column in my > composite key .. but if I add multiple columns in the composite key > separated by a delimi .. it then throws the illegal partition error (keys > before the pipe are group keys and after the pipe are the sort keys and my > partioner only uses the group keys > > java.io.IOException: Illegal partition for Atlanta:GA|Atlanta:GA:1:Adeel > (-1) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:39) > at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:1) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > > public int getPartition(Text key, HCatRecord record, int numParts) { > //extract the group key from composite key > String groupKey = key.toString().split("\\|")[0]; > return groupKey.hashCode() % numParts; > } > > > On Thu, Aug 29, 2013 at 8:31 PM, Shekhar Sharma > wrote: >> >> No...partitionr decides which keys should go to which reducer...and >> number of reducers you need to decide...No of reducers depends on >> factors like number of key value pair, use case etc >> Regards, >> Som Shekhar Sharma >> +91-8197243810 >> >> >> On Fri, Aug 30, 2013 at 5:54 AM, Adeel Qureshi >> wrote: >> > so it cant figure out an appropriate number of reducers as it does for >> > mappers .. in my case hadoop is using 2100+ mappers and then only 1 >> > reducer >> > .. since im overriding the partitioner class shouldnt that decide how >> > manyredeucers there should be based on how many different partition >> > values >> > being returned by the custom partiotioner >> > >> > >> > On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley wrote: >> >> >> >> If you don't specify the number of Reducers, Hadoop will use the >> >> default >> >> -- which, unless you've changed it, is 1. >> >> >> >> Regards >> >> >> >> Ian. >> >> >> >> On Aug 29, 2013, at 4:23 PM, Adeel Qureshi >> >> wrote: >> >> >> >> I have implemented secondary sort in my MR job and for some reason if i >> >> dont specify the number of reducers it uses 1 which doesnt seems right >> >> because im working with 800M+ records and one reducer slows things down >> >> significantly. Is this some kind of limitation with the secondary sort >> >> that >> >> it has to use a single reducer .. that kind of would defeat the purpose >> >> of >> >> having a scalable solution such as secondary sort. I would appreciate >> >> any >> >> help. >> >> >> >> Thanks >> >> Adeel >> >> >> >> >> >> >> >> --- >> >> Ian Wrigley >> >> Sr. Curriculum Manager >> >> Cloudera, Inc >> >> Cell: (323) 819 4075 >> >> >> > > >