hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: bulk loaded value
Date Tue, 18 Jan 2011 23:25:01 GMT
Here is the doc you want to edit: src/site/xdoc/bulk-loads.xml   Here
is where you do checkout http://hbase.apache.org/version_control.html.
Thanks,
St.Ack

On Tue, Jan 18, 2011 at 3:23 PM, Stack <stack@duboce.net> wrote:
> Whatever is easier on you.   A patch is easier on us (open a JIRA to
> hang it on).  Thanks G,
> St.Ack
>
> On Tue, Jan 18, 2011 at 3:18 PM, Geoff Hendrey <ghendrey@decarta.com> wrote:
>> Ok, I'll take a crack at it. Do you want a literal patch file, or just a suggestion
for some better wording?
>>
>> -geoff
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Tuesday, January 18, 2011 10:51 AM
>> To: user@hbase.apache.org
>> Cc: hbase-user@hadoop.apache.org
>> Subject: Re: bulk loaded value
>>
>> Any chance of a patch to the doc Geoff?  If you ran into the issue,
>> others will too.
>>
>> I'm glad you figured it.
>>
>> St.Ack
>>
>> On Tue, Jan 18, 2011 at 9:24 AM, Geoff Hendrey <ghendrey@decarta.com> wrote:
>>> Thanks for your response. Her is what happened. I didn't realize that no matter
what reducer you specify, when you use configureIncrementalLoad, the HFileOutputformat will
ignore your reducer and use it's own. Here is code from configureIncrementalLoad:
>>
>>>
>>>    // Based on the configured map output class, set the correct reducer to
properly
>>>    // sort the incoming values.
>>>    // TODO it would be nice to pick one or the other of these formats.
>>>    if (KeyValue.class.equals(job.getMapOutputValueClass())) {
>>>      job.setReducerClass(KeyValueSortReducer.class);
>>>    } else if (Put.class.equals(job.getMapOutputValueClass())) {
>>>      job.setReducerClass(PutSortReducer.class);
>>>    } else {
>>>      LOG.warn("Unknown map output value type:" + job.getMapOutputValueClass());
>>>    }
>>>
>>> That point wasn't clear to me from the doc's. So of course, now I understand
that no matter what I put in my reducer, the reducer never gets invoked. To work around this,
I just output my data to a sequence file, and then use the HFileOutputFormat with configureIncrementalLoad
to bulk load the sequence file.
>>>
>>> By the way, so far the performance of the bulk loader is amazing compared to
trying to do batch inserts from a mapreduce job by doing Put from the reducer. Thanks.
>>>
>>> -g
>>>
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>>> Sent: Monday, January 17, 2011 9:55 PM
>>> To: user@hbase.apache.org
>>> Cc: hbase-user@hadoop.apache.org
>>> Subject: Re: bulk loaded value
>>>
>>> Whats 'key' in the below?  Is it a key of yours?  Its some
>>> incrementing long?  When you create the KeyValue below, you are
>>> setting this long as your row value.
>>>
>>> The shell does best effort at Stringifying everything it sees.  It
>>> passes all bytes via this function on emission:
>>> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/apidocs/index.html
>>> (Doc could be a bit better but you get the idea).
>>>
>>> By what you print below, it looks like its a zero-padded long and
>>> we're outputting it in HEX with each byte of the long escaped.
>>>
>>> St.Ack
>>>
>>> On Mon, Jan 17, 2011 at 9:30 AM, Geoff Hendrey <ghendrey@decarta.com> wrote:
>>>> Hi -
>>>>
>>>>
>>>>
>>>> I am using 0.89 for bulk loading. In my reducer:
>>>>
>>>>
>>>>
>>>>            ImmutableBytesWritable ibw = new
>>>> ImmutableBytesWritable(key.copyBytes());
>>>>
>>>>            KeyValue kv = new KeyValue(key.copyBytes(),
>>>> Bytes.toBytes(context.getConfiguration().get("fam", "count")),
>>>> Bytes.toBytes("c"), 100L,"hello".getBytes());
>>>>
>>>>            context.write(ibw, kv);
>>>>
>>>>
>>>>
>>>> The keys and the timestamp seem to go into HBase fine when I use the
>>>> 'completebulkload' bulk loader, but the hbase shell shows
>>>> value=\x00\x00\x00\x01 for every value. The keys and timestamps are
>>>> fine. I've read all the relevant doc's and posts that I can find on
>>>> this, but I'm still scratching my head about what I am doing wrong to
>>>> generate the value.
>>>>
>>>>
>>>>
>>>> -g
>>>>
>>>>
>>>
>>
>

Mime
View raw message