hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nanheng Wu <nanhen...@gmail.com>
Subject Re: Bulk load using HFileOutputFormat.RecordWriter
Date Thu, 06 Jan 2011 18:17:36 GMT
Thanks for the answer Todd. I realized that I was making my life
harder by using the low level record writer directly. Instead I just
made the mapper output a <ImmutableBytesWriteable, KeyValue> pair and
set the output format to HFileOutputFormat. It works really great! I
have a follow up question, after I run the loadtable.rb script it
looks a little while before the table is actually ready to be queried.
Is there a way to programmatically test if the table is "ready"? I am
using hbase-0.20.6. Thanks!

On Wed, Jan 5, 2011 at 6:48 PM, Todd Lipcon <todd@cloudera.com> wrote:
> Hi Nanheng,
> It sounds like you're on the right path. It sounds like you're missing the
> "commit" step when using the output format.
> The layout of the output dir should look something like:
> output/
> output/colfam/
> output/colfam/234923423
> output/colfam/349593453  <-- these are just unique IDs
> Thanks
> -Todd
> On Wed, Jan 5, 2011 at 3:54 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>> Hi,
>>  I am new to HBase and Hadoop and I am trying to find the best way to
>> bulk load a table from HDFS to HBase. I don't mind creating a new
>> table for each batch and what I understand using HFileOutputFormat
>> directly in a MR job is the most efficient method. My input data set
>> is already in sorted order, it seems to me that I don't need to use
>> reducers, which require me to do a globally sort already sorted data.
>> I tried to use HFileOutputFormat.getRecordWriter in my mapper and 0
>> reducers but the output directory has a only a _temporary directory
>> with my outputs in each subdirectory. That doesn't seem be be what the
>> loadtable script expects  (a column family directory with HFiles). Can
>> someone tell me if what I am doing makes sense in general or how to do
>> this properly? Thanks!
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message