hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Fast importing into HBase (bypassing RegionServer)
Date Mon, 27 Jul 2009 20:52:06 GMT
Latest thinking is write a MR job that in the reducer writes hfiles that are
just under a region size (<256M).  When reducer has reached about 240MB, it
opens new file.  (May need to write custom ReduceRunner to keep account of
whats been written and to rotate the file).

After the MR has finished, a script would come along, move the hfiles into
appropriate directory structure.  Each hfile would be the sole content of
the region.  The script would read from each hfile's metadata its first and
last keys and then using this metainfo along with a table format specified
externally, insert an entry into .META. per region (See the scripts in bin
-- copy and rename table -- for examples of how to manipulate .META.).

Someone needs to just do it.  We've been talking about it for ever.

P.S. Here is older thinking on the topic

On Mon, Jul 27, 2009 at 1:31 PM, tim robertson <timrobertson100@gmail.com>wrote:

> Hi all,
> Ryan wrote on a different thread:
> "It should be possible to randomly insert data from a pre-existing
> data set.  There is some work to directly import straight into hfiles
> and skipping the regionserver, but that would only really work on 1
> time imports to new tables."
> Could someone please elaborate on this a little and outline the steps
> needed?  Do you write an hfile in a custom mapreduce output format and
> then somehow write the table metadata file afterwards?
> Cheers,
> Tim

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message