Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 36790 invoked from network); 27 Jul 2009 20:56:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jul 2009 20:56:05 -0000 Received: (qmail 98629 invoked by uid 500); 27 Jul 2009 20:57:09 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 98584 invoked by uid 500); 27 Jul 2009 20:57:09 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 98574 invoked by uid 99); 27 Jul 2009 20:57:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 20:57:09 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ryanobjc@gmail.com designates 209.85.217.227 as permitted sender) Received: from [209.85.217.227] (HELO mail-gx0-f227.google.com) (209.85.217.227) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 20:57:01 +0000 Received: by gxk27 with SMTP id 27so5528640gxk.12 for ; Mon, 27 Jul 2009 13:56:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=JOWQPj2Z7ZuuBvoSY1uOTVWxPeHX1ZUGnSkXrvShEt0=; b=pbQ4lblJiIQQhFgFz87OngjcYrnuQcyC5Bxg7Lzenp8xa4OMKC0YbVw2lHs8dxPNk8 ly0WY3YDaXh4DrVRKe/ONTHQi0044YscdbQ8yWAk4km1EYJAwd6KbancbsDPFLlzQxft BLKR19xEUjJ8jZxwZZzY2Yd1EQBIpMGDEgdYI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=KBbVLu4uQgxjdswLPuMC9Ffj+5KXtw63gh+3cmhx7ihuP0yZh2Q2jvT0Y5WkG1Gqrj oBNRsX6nN/rhP1x2/AkXRChwQIu4l1siquZ8Q+csgS7QwWLu7wwwYsYc0k3goVOarYEb YhEiEvhILLLzJ1cr9OmFlxk6VHRDP5gku2134= MIME-Version: 1.0 Received: by 10.151.45.21 with SMTP id x21mr11719561ybj.91.1248728200973; Mon, 27 Jul 2009 13:56:40 -0700 (PDT) In-Reply-To: <7c962aed0907271352w65fb999gda863ca4b5c8d001@mail.gmail.com> References: <32120a6a0907271331i73b63f3fiddcfe26928d18fd8@mail.gmail.com> <7c962aed0907271352w65fb999gda863ca4b5c8d001@mail.gmail.com> Date: Mon, 27 Jul 2009 13:56:40 -0700 Message-ID: <78568af10907271356k66c32a53web3a164176d1c4e2@mail.gmail.com> Subject: Re: Fast importing into HBase (bypassing RegionServer) From: Ryan Rawson To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The last time I seriously looked at this, it was to answer serious performance issues with HBase. I eventually fixed said performance issues, and thus went on to drop the idea overall. -ryan On Mon, Jul 27, 2009 at 1:52 PM, stack wrote: > Latest thinking is write a MR job that in the reducer writes hfiles that = are > just under a region size (<256M). =A0When reducer has reached about 240MB= , it > opens new file. =A0(May need to write custom ReduceRunner to keep account= of > whats been written and to rotate the file). > > After the MR has finished, a script would come along, move the hfiles int= o > appropriate directory structure. =A0Each hfile would be the sole content = of > the region. =A0The script would read from each hfile's metadata its first= and > last keys and then using this metainfo along with a table format specifie= d > externally, insert an entry into .META. per region (See the scripts in bi= n > -- copy and rename table -- for examples of how to manipulate .META.). > > Someone needs to just do it. =A0We've been talking about it for ever. > > St.Ack > P.S. Here is older thinking on the topic > https://issues.apache.org/jira/browse/HBASE-48 > > On Mon, Jul 27, 2009 at 1:31 PM, tim robertson wrote: > >> Hi all, >> >> Ryan wrote on a different thread: >> >> "It should be possible to randomly insert data from a pre-existing >> data set. =A0There is some work to directly import straight into hfiles >> and skipping the regionserver, but that would only really work on 1 >> time imports to new tables." >> >> Could someone please elaborate on this a little and outline the steps >> needed? =A0Do you write an hfile in a custom mapreduce output format and >> then somehow write the table metadata file afterwards? >> >> Cheers, >> >> Tim >> >