Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79B57F169 for ; Wed, 10 Apr 2013 19:06:52 +0000 (UTC) Received: (qmail 63767 invoked by uid 500); 10 Apr 2013 19:06:50 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 63712 invoked by uid 500); 10 Apr 2013 19:06:50 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 63702 invoked by uid 99); 10 Apr 2013 19:06:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Apr 2013 19:06:50 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of graeme.wallace@farecompare.com designates 74.125.149.244 as permitted sender) Received: from [74.125.149.244] (HELO na3sys009aog118.obsmtp.com) (74.125.149.244) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 10 Apr 2013 19:06:45 +0000 Received: from mail-bk0-f70.google.com ([209.85.214.70]) (using TLSv1) by na3sys009aob118.postini.com ([74.125.148.12]) with SMTP ID DSNKUWW4HWKbP5RhJH9jmx0rTXnp6IY0KRO9@postini.com; Wed, 10 Apr 2013 12:06:25 PDT Received: by mail-bk0-f70.google.com with SMTP id y8so938756bkt.1 for ; Wed, 10 Apr 2013 12:06:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=ERNqkKm4zBh9VqvX21fqG7HC9s4l5Zj8Idktq39ZM6c=; b=PqrSG8lIjXN8sYrv6GA/RH383rdhxAqEkYLqspdatZGwfet5Dt77Ae4xg//Tj2ovSh 0DVJ7lRVG91OK9vnrP2HA8rZzaF0Vj8jq4nMkxIqepJfMoHNBMGH62ptyJtO2xp8nXpU WaVMFjtnQwHhy9QjULR48vidWH95YoVEBpIV39l7JwtVhzzuljxiZVfWDS7/nsTv508A w81SS2R6XNwV+ZwaqvdMCoTOYomcz/7ZFbdZg8PXzYiqniIzv/Vw7t6TELQVulYQqTKz PSL/Cacpmy69Sdli3hu7ivMi3XIep7nY2RIAd/hbhDtpwoMdDVdpsxB9LNJv5TPcJwaw 64gw== X-Received: by 10.205.9.1 with SMTP id ou1mr1309044bkb.119.1365620765654; Wed, 10 Apr 2013 12:06:05 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.205.9.1 with SMTP id ou1mr1309038bkb.119.1365620765470; Wed, 10 Apr 2013 12:06:05 -0700 (PDT) Received: by 10.205.83.8 with HTTP; Wed, 10 Apr 2013 12:06:05 -0700 (PDT) In-Reply-To: References: Date: Wed, 10 Apr 2013 14:06:05 -0500 Message-ID: Subject: Re: MapReduce: Reducers partitions. From: Graeme Wallace To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=20cf301cbed69ffa2004da065c8b X-Gm-Message-State: ALoCoQkyyLBzQDOT55Vgb60Nu4BOBvnTSGZTzE2rDQ+43h7GJssk4TtfTpYp87uDJWL1g3K4K7XjprJ3ibdBXkoqcOk7qj8YzMYoPH3vSFdbDj2hT9nbRKaU/xd0XuJQyISAg+9cxyful+oHV2oZkNX4VeotA6wG964gJIzTCefQq96/Ik14Gs0= X-Virus-Checked: Checked by ClamAV on apache.org --20cf301cbed69ffa2004da065c8b Content-Type: text/plain; charset=ISO-8859-1 Ok. Thanks. On Wed, Apr 10, 2013 at 2:01 PM, Jean-Marc Spaggiari < jean-marc@spaggiari.org> wrote: > Hi Greame, > > No. The reducer will simply write on the table the same way you are doing a > regular Put. If a split is required because of the size, then the region > will be split, but at the end, there will not necessary be any region > split. > > In the usecase described below, all the 600 lines will "simply" go into the > only region in the table and no split will occur. > > The goal is to partition the data for the reducer only. Not in the table. > > JM > > 2013/4/10 Graeme Wallace > > > Whats the behavior then if you return hash % num_reducers and you have no > > splits defined. When the reducer writes to the table does the region > server > > local to the reducer create a new region ? > > > > Graeme > > > > > > On Wed, Apr 10, 2013 at 1:26 PM, Jean-Marc Spaggiari < > > jean-marc@spaggiari.org> wrote: > > > > > So. > > > > > > I looked at the code, and I have one comment/suggestion here. > > > > > > If the table we are outputing to has regions, then partitions are build > > > around that, and that's fine. But if the table is totally empty with a > > > single region, even if we setNumReduceTasks to 2 or more, all the keys > > will > > > go on the same first reducer because of this: > > > if (this.startKeys.length == 1){ > > > return 0; > > > } > > > I think it will be better to return something like keycrc%numPartitions > > > instead. That still allow the application to spread the reducing > process > > > over multinode(racks) even if there is only one region in the table. > > > > > > In my usecase, I have millions of lines producing some statistics. At > the > > > end, I will have only about 600 lines, but it will take a lot of map > and > > > reduce time to go from millions to 600, that's why I'm looking to have > > more > > > than one reducer. However, with only 600 lines, it's very difficult to > > > pre-split the table. Keys are all very close. > > > > > > Does anyone see anything wrong with changing this default behaviour > when > > > startKeys.length == 1? If not, I will open a JIRA and upload a patch. > > > > > > JM > > > > > > 2013/4/10 Jean-Marc Spaggiari > > > > > > > Thanks Ted. > > > > > > > > It's exactly where I was looking at now. I was close. I will take a > > > deeper > > > > look. > > > > > > > > Thanks Nitin for the link. I will read that too. > > > > > > > > JM > > > > > > > > 2013/4/10 Nitin Pawar > > > > > > > >> To add what Ted said, > > > >> > > > >> the same discussion happened on the question Jean asked > > > >> > > > >> https://issues.apache.org/jira/browse/HBASE-1287 > > > >> > > > >> > > > >> On Wed, Apr 10, 2013 at 7:28 PM, Ted Yu > wrote: > > > >> > > > >> > Jean-Marc: > > > >> > Take a look at HRegionPartitioner which is in both mapred and > > > mapreduce > > > >> > packages: > > > >> > > > > >> > * This is used to partition the output keys into groups of keys. > > > >> > > > > >> > * Keys are grouped according to the regions that currently exist > > > >> > > > > >> > * so that each reducer fills a single region so load is > > distributed. > > > >> > > > > >> > Cheers > > > >> > > > > >> > On Wed, Apr 10, 2013 at 6:54 AM, Jean-Marc Spaggiari < > > > >> > jean-marc@spaggiari.org> wrote: > > > >> > > > > >> > > Hi Nitin, > > > >> > > > > > >> > > You got my question correctly. > > > >> > > > > > >> > > However, I'm wondering how it's working when it's done into > HBase. > > > Do > > > >> > > we have defaults partionners so we have the same garantee that > > > records > > > >> > > mapping to one key go to the same reducer. Or do we have to > > > implement > > > >> > > this one our own. > > > >> > > > > > >> > > JM > > > >> > > > > > >> > > 2013/4/10 Nitin Pawar : > > > >> > > > I hope i understood what you are asking is this . If not then > > > >> pardon me > > > >> > > :) > > > >> > > > from the hadoop developer handbook few lines > > > >> > > > > > > >> > > > The*Partitioner* class determines which partition a given > (key, > > > >> value) > > > >> > > pair > > > >> > > > will go to. The default partitioner computes a hash value for > > the > > > >> key > > > >> > and > > > >> > > > assigns the partition based on this result. It garantees that > > all > > > >> the > > > >> > > > records mapping to one key go to same reducer > > > >> > > > > > > >> > > > You can write your custom partitioner as well > > > >> > > > here is the link : > > > >> > > > > > > >> > http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > On Wed, Apr 10, 2013 at 6:19 PM, Jean-Marc Spaggiari < > > > >> > > > jean-marc@spaggiari.org> wrote: > > > >> > > > > > > >> > > >> Hi, > > > >> > > >> > > > >> > > >> quick question. How are the data from the map tasks > > partitionned > > > >> for > > > >> > > >> the reducers? > > > >> > > >> > > > >> > > >> If there is 1 reducer, it's easy, but if there is more, are > all > > > >> they > > > >> > > >> same keys garanteed to end on the same reducer? Or not > > necessary? > > > >> If > > > >> > > >> they are not, how can we provide a partionning function? > > > >> > > >> > > > >> > > >> Thanks, > > > >> > > >> > > > >> > > >> JM > > > >> > > >> > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > -- > > > >> > > > Nitin Pawar > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> -- > > > >> Nitin Pawar > > > >> > > > > > > > > > > > > > > > > > > > -- > > Graeme Wallace > > CTO > > FareCompare.com > > O: 972 588 1414 > > M: 214 681 9018 > > > -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018 --20cf301cbed69ffa2004da065c8b--