Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E165D73D for ; Thu, 1 Nov 2012 17:04:08 +0000 (UTC) Received: (qmail 39219 invoked by uid 500); 1 Nov 2012 17:04:06 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 39167 invoked by uid 500); 1 Nov 2012 17:04:06 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 39151 invoked by uid 99); 1 Nov 2012 17:04:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 17:04:05 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amits@infolinks.com designates 207.126.144.117 as permitted sender) Received: from [207.126.144.117] (HELO eu1sys200aog104.obsmtp.com) (207.126.144.117) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 01 Nov 2012 17:03:56 +0000 Received: from mail-ia0-f197.google.com ([209.85.210.197]) (using TLSv1) by eu1sys200aob104.postini.com ([207.126.147.11]) with SMTP ID DSNKUJKrZ42CUdCw6u7XVhPeKdY72W78mNCW@postini.com; Thu, 01 Nov 2012 17:03:36 UTC Received: by mail-ia0-f197.google.com with SMTP id j5so4842723iaf.8 for ; Thu, 01 Nov 2012 10:03:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=sGgddsL+0oaECMnPdyEYJB3+rM3fmczApakkfs7g4wo=; b=o4S2/r0WfLYTPRNHHBc6UZoVIegyzg+Hefq72pYmomjbvBMqugND8nW0S1bVQBQg2V nK4MCFP5wCzLTnIXSCyJixnyVMvwFiah/sGMopHwQ42Zy7hcbp5YX6LqisU0xg31dKxb ZkAW0Utx1tNTYqC7E4dTbKk/vZV6LdUp8DN/lewGdbeCe2UwqGJlzs5Qw70TfJ04zM9o XzKbFrJF7rAtNITXwCyMaVftAjGNE7GxB358pUlgAvb93fTBoCzY0XtQsJGu+e8pWGHX O3/PNVHMQAa7dDDD7RcZQIsZw22wzYgLNeaw/2XdbASAvW4FxYgOUE51dbhbe8KaR8Ea 4LBQ== Received: by 10.50.152.194 with SMTP id va2mr1933001igb.25.1351789414327; Thu, 01 Nov 2012 10:03:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.152.194 with SMTP id va2mr1932994igb.25.1351789414235; Thu, 01 Nov 2012 10:03:34 -0700 (PDT) Received: by 10.231.178.11 with HTTP; Thu, 1 Nov 2012 10:03:34 -0700 (PDT) Date: Thu, 1 Nov 2012 19:03:34 +0200 Message-ID: Subject: Bulk Loading - LoadIncrementalHFiles From: Amit Sela To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f3b9de9d8bf1304cd71ff57 X-Gm-Message-State: ALoCoQk2FOLdGLamyWRyZn+fZguctYIkGDhsmHVCU+4sE3+jW7O0wz3mcHLCyZG2ZSg6D6fzJjhIcCWtOSiGjaO60p7848zf/hjhbCAT8LGh+VNjyg5cHeegkyWkgfVyg40DclxmF7pDEKbCo+vVtYwbqJ9Q3xd1pA== X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3b9de9d8bf1304cd71ff57 Content-Type: text/plain; charset=ISO-8859-1 Hi everyone, I'm using MR to bulk load into HBase by using HFileOutputFormat.configureIncrementalLoad and after the job is complete I use loadIncrementalHFiles.doBulkLoad >From what I see, the MR outputs a file for each CF written and to my understanding these files are loaded as store files into a region. What I don't understand is *how many regions will open* ? and *how is that determined *? If I have 3 CF's and a lot of data to load, does that mean 3 large store files will load into 1 region (more ?) and this region will split on major compaction ? Can I pre-create regions and tell the bulk load to split the data between them during the load ? In general, if someone could elaborate about LoadIncrementalHFiles it would save me a lot of time diving into it. Another question I is about running over values, is it possible to load an updated value ? or generally updating columns and values for an existing key ? I'd think that there's no problem but when I try to run the same bulk load twice (MR and then load) with the same data, the second time fails. Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........ I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution exception during splitting... Thanks! --e89a8f3b9de9d8bf1304cd71ff57--