Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 28268 invoked from network); 13 Oct 2010 12:40:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Oct 2010 12:40:34 -0000 Received: (qmail 80556 invoked by uid 500); 13 Oct 2010 12:40:33 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 80266 invoked by uid 500); 13 Oct 2010 12:40:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 80257 invoked by uid 99); 13 Oct 2010 12:40:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Oct 2010 12:40:30 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Oct 2010 12:40:23 +0000 Received: by iwn1 with SMTP id 1so8519300iwn.14 for ; Wed, 13 Oct 2010 05:40:02 -0700 (PDT) Received: by 10.231.157.195 with SMTP id c3mr6957788ibx.155.1286973602419; Wed, 13 Oct 2010 05:40:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.157.19 with HTTP; Wed, 13 Oct 2010 05:39:41 -0700 (PDT) In-Reply-To: References: From: Todd Lipcon Date: Wed, 13 Oct 2010 12:39:41 +0000 Message-ID: Subject: Re: Bulk import tools for HBase To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Oct 11, 2010 at 9:33 PM, Sean Bigdatafun wrote: > Another potential "problem" of incremental bulk loader is that the number= of > reducers (for the bulk loading process) needs to be equal to the existing > regions -- this seems to be unfeasible for very large table, say with 200= 0 > regions. > > Any comment on this? Thanks. Yes, this is currently problematic if you have a very large table (2000 regions) and a small MR cluster (where 2000 reducers is too many). It wouldn't be too difficult to amend the code so that each reducer is responsible for a contiguous range of regions, and knows the split the HFiles at region boundaries. Patches welcome :) -Todd > > Sean > > On Fri, Oct 8, 2010 at 9:03 PM, Todd Lipcon wrote: > >> What version are you building from? These tools are new as of this past >> june. >> >> -Todd >> >> On Fri, Oct 8, 2010 at 4:52 PM, Leo Alekseyev wrote: >> >> =A0> We want to investigate HBase bulk imports, as described on >> > http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html and and/or >> > JIRA HBASE-48. =A0I can't seem to run either the importtsv tool or the >> > completebulkload tool using the hadoop jar /path/to/hbase-VERSION.jar >> > command. =A0In fact, the ImportTsv class is not part of that jar file. >> > Am I looking in the wrong place for this class, or do I need to >> > somehow customize the build process to include it?.. =A0Our HBase was >> > built from source using the default procedure. >> > >> > Thanks for any insight, >> > --Leo >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > --=20 Todd Lipcon Software Engineer, Cloudera