Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 87469 invoked from network); 15 Nov 2010 08:24:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Nov 2010 08:24:38 -0000 Received: (qmail 73567 invoked by uid 500); 15 Nov 2010 08:25:09 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 73065 invoked by uid 500); 15 Nov 2010 08:25:07 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 73051 invoked by uid 99); 15 Nov 2010 08:25:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Nov 2010 08:25:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shujamughal@gmail.com designates 74.125.82.41 as permitted sender) Received: from [74.125.82.41] (HELO mail-ww0-f41.google.com) (74.125.82.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Nov 2010 08:24:58 +0000 Received: by wwb22 with SMTP id 22so173969wwb.2 for ; Mon, 15 Nov 2010 00:24:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=Pz0xaP8cjuxlE0cgZoI4sJJ5zHCSbHVH1mc95r2tw9Y=; b=ZiCpGKLmPw31+dS5kdqrslxscR4S7qKddrPMvNpfAGQODotg/+LPG/agzIUIQR0weh i9TxL5cKcveo96SLnvKXytgN/rw4xJpIRyN0p8M8A7cK2Q43FAngRx+2leSK6OyJfUpE pcLLzBhXlZ3JVxz4lzGtmPUC1ddPKdWTN+za8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=b+OF/2Co04ZUnAnbgpllz1VhG5TAiaTZUEjbMYDcjd7O8m0NHIt10S6md7VOgsVTz0 HW0vwDbrFgOtJwV+T1wgzwd8uiqSIo0QZQzJ/ZrQMFkQZFDz/Jz+jQX1lWF3CYkKStyr xDEFo/uMBDjPLgyLeSzvEaN8mnHO9Eqrr8F+U= MIME-Version: 1.0 Received: by 10.216.164.194 with SMTP id c44mr4474587wel.107.1289809477962; Mon, 15 Nov 2010 00:24:37 -0800 (PST) Received: by 10.216.10.69 with HTTP; Mon, 15 Nov 2010 00:24:37 -0800 (PST) In-Reply-To: References: <4CDD8328.9030007@opendns.com> Date: Mon, 15 Nov 2010 13:24:37 +0500 Message-ID: Subject: Re: Bulk Load Sample Code From: Shuja Rehman To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001636426735c30cbe0495132b16 X-Virus-Checked: Checked by ClamAV on apache.org --001636426735c30cbe0495132b16 Content-Type: text/plain; charset=ISO-8859-1 If HRegionPartitioner works correctly then what is the use of configureIncrementalLoad() as discussed here http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html and why this link did not discuss about HRegionPartitioner? Where is the documentaion of HRegionPartitioner and in which case we give preference to HRegionPartitioner over configureIncrementalLoad(). What is the difference in both? On Sat, Nov 13, 2010 at 5:12 AM, Todd Lipcon wrote: > I'm surprised that HRegionPartitioner works correctly for incremental load. > It definitely won't work if the regions are also shifting during the MR > job. > > Thanks > -Todd > > On Fri, Nov 12, 2010 at 10:10 AM, Adam Phelps wrote: > > > On 11/10/10 11:57 AM, Stack wrote: > > > >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman > >> wrote: > >> > >>> oh! I think u have not read the full post. The essay has 3 paragraphs > :) > >>> > >>> *Should I need to add the following line also > >>> > >>> job.setPartitionerClass(TotalOrderPartitioner.class); > >>> > >>> > >> You need to specify other than default partitioner so yes, above seems > >> necessary (Be aware that if only one reducer, all may appear to work > >> though your partitioner is bad... its when you have multiple reducers > >> that bad partitioner will show). > >> > > > > I skimmed over this thread as we've been using LoadIncrementalHFiles to > > load the output of our MR jobs, however it looks like we're using > > HRegionPartitioner rather than TotalOrderPartitioner. The current code > is > > definitely working, however the page regarding bulk loads that was posted > > earlier implies that TotalOrderPartitioner is best for efficiency. What > is > > the difference between the two? > > > > - Adam > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Regards Shuja-ur-Rehman Baig --001636426735c30cbe0495132b16--