Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 35110 invoked from network); 27 Jul 2009 20:51:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jul 2009 20:51:31 -0000 Received: (qmail 92700 invoked by uid 500); 27 Jul 2009 20:52:35 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 92677 invoked by uid 500); 27 Jul 2009 20:52:35 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 92667 invoked by uid 99); 27 Jul 2009 20:52:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 20:52:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 74.125.92.25 as permitted sender) Received: from [74.125.92.25] (HELO qw-out-2122.google.com) (74.125.92.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 20:52:27 +0000 Received: by qw-out-2122.google.com with SMTP id 8so1670997qwh.35 for ; Mon, 27 Jul 2009 13:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=Cj8xtK8MtxUJK2d3IbDBMeCC51mIbKfvWOYwKm6rX9o=; b=YXXeLCCIi2ef92qzKk+YdXrWEe3earWVD2C4rU4CW4ym3xlys/V2470TZYJVCI0BKd AY3I5W4G9cAHXFkhkb1gAc99VskQnVqsbUbbJBZUQ3Znn19v8EfKrBvO85Wn/TswBBZ9 So51xJNoMVERIAbTDCEMqMbJfIHi2g3hcaZ+M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=jRNVGbgbkiqVwg/GWk/BGJ7DewpyyAwBO+BKbmF7eZsnj2A2yk3YqTikE0f41KNbOW 6jtXCzSq1TY5zbIlQVpt7s535D3Gl2Wllqu22sGlTZ1bJ8G+iwO12nLILgtAZvuIwwT8 0sq//Xrr6zBy3TMc/L9fWMTA+DECfUa4jPvgs= MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.229.85.14 with SMTP id m14mr1591995qcl.64.1248727926594; Mon, 27 Jul 2009 13:52:06 -0700 (PDT) In-Reply-To: <32120a6a0907271331i73b63f3fiddcfe26928d18fd8@mail.gmail.com> References: <32120a6a0907271331i73b63f3fiddcfe26928d18fd8@mail.gmail.com> Date: Mon, 27 Jul 2009 13:52:06 -0700 X-Google-Sender-Auth: 1ed9edac37cc1d7a Message-ID: <7c962aed0907271352w65fb999gda863ca4b5c8d001@mail.gmail.com> Subject: Re: Fast importing into HBase (bypassing RegionServer) From: stack To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00163642749e7c5dc7046fb6200f X-Virus-Checked: Checked by ClamAV on apache.org --00163642749e7c5dc7046fb6200f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Latest thinking is write a MR job that in the reducer writes hfiles that are just under a region size (<256M). When reducer has reached about 240MB, it opens new file. (May need to write custom ReduceRunner to keep account of whats been written and to rotate the file). After the MR has finished, a script would come along, move the hfiles into appropriate directory structure. Each hfile would be the sole content of the region. The script would read from each hfile's metadata its first and last keys and then using this metainfo along with a table format specified externally, insert an entry into .META. per region (See the scripts in bin -- copy and rename table -- for examples of how to manipulate .META.). Someone needs to just do it. We've been talking about it for ever. St.Ack P.S. Here is older thinking on the topic https://issues.apache.org/jira/browse/HBASE-48 On Mon, Jul 27, 2009 at 1:31 PM, tim robertson wrote: > Hi all, > > Ryan wrote on a different thread: > > "It should be possible to randomly insert data from a pre-existing > data set. There is some work to directly import straight into hfiles > and skipping the regionserver, but that would only really work on 1 > time imports to new tables." > > Could someone please elaborate on this a little and outline the steps > needed? Do you write an hfile in a custom mapreduce output format and > then somehow write the table metadata file afterwards? > > Cheers, > > Tim > --00163642749e7c5dc7046fb6200f--