hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: bulk loaded value
Date Tue, 18 Jan 2011 23:23:43 GMT
Whatever is easier on you.   A patch is easier on us (open a JIRA to
hang it on).  Thanks G,
St.Ack

On Tue, Jan 18, 2011 at 3:18 PM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> Ok, I'll take a crack at it. Do you want a literal patch file, or just a suggestion for
some better wording?
>
> -geoff
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Tuesday, January 18, 2011 10:51 AM
> To: user@hbase.apache.org
> Cc: hbase-user@hadoop.apache.org
> Subject: Re: bulk loaded value
>
> Any chance of a patch to the doc Geoff?  If you ran into the issue,
> others will too.
>
> I'm glad you figured it.
>
> St.Ack
>
> On Tue, Jan 18, 2011 at 9:24 AM, Geoff Hendrey <ghendrey@decarta.com> wrote:
>> Thanks for your response. Her is what happened. I didn't realize that no matter what
reducer you specify, when you use configureIncrementalLoad, the HFileOutputformat will ignore
your reducer and use it's own. Here is code from configureIncrementalLoad:
>
>>
>>    // Based on the configured map output class, set the correct reducer to properly
>>    // sort the incoming values.
>>    // TODO it would be nice to pick one or the other of these formats.
>>    if (KeyValue.class.equals(job.getMapOutputValueClass())) {
>>      job.setReducerClass(KeyValueSortReducer.class);
>>    } else if (Put.class.equals(job.getMapOutputValueClass())) {
>>      job.setReducerClass(PutSortReducer.class);
>>    } else {
>>      LOG.warn("Unknown map output value type:" + job.getMapOutputValueClass());
>>    }
>>
>> That point wasn't clear to me from the doc's. So of course, now I understand that
no matter what I put in my reducer, the reducer never gets invoked. To work around this, I
just output my data to a sequence file, and then use the HFileOutputFormat with configureIncrementalLoad
to bulk load the sequence file.
>>
>> By the way, so far the performance of the bulk loader is amazing compared to trying
to do batch inserts from a mapreduce job by doing Put from the reducer. Thanks.
>>
>> -g
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Monday, January 17, 2011 9:55 PM
>> To: user@hbase.apache.org
>> Cc: hbase-user@hadoop.apache.org
>> Subject: Re: bulk loaded value
>>
>> Whats 'key' in the below?  Is it a key of yours?  Its some
>> incrementing long?  When you create the KeyValue below, you are
>> setting this long as your row value.
>>
>> The shell does best effort at Stringifying everything it sees.  It
>> passes all bytes via this function on emission:
>> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/apidocs/index.html
>> (Doc could be a bit better but you get the idea).
>>
>> By what you print below, it looks like its a zero-padded long and
>> we're outputting it in HEX with each byte of the long escaped.
>>
>> St.Ack
>>
>> On Mon, Jan 17, 2011 at 9:30 AM, Geoff Hendrey <ghendrey@decarta.com> wrote:
>>> Hi -
>>>
>>>
>>>
>>> I am using 0.89 for bulk loading. In my reducer:
>>>
>>>
>>>
>>>            ImmutableBytesWritable ibw = new
>>> ImmutableBytesWritable(key.copyBytes());
>>>
>>>            KeyValue kv = new KeyValue(key.copyBytes(),
>>> Bytes.toBytes(context.getConfiguration().get("fam", "count")),
>>> Bytes.toBytes("c"), 100L,"hello".getBytes());
>>>
>>>            context.write(ibw, kv);
>>>
>>>
>>>
>>> The keys and the timestamp seem to go into HBase fine when I use the
>>> 'completebulkload' bulk loader, but the hbase shell shows
>>> value=\x00\x00\x00\x01 for every value. The keys and timestamps are
>>> fine. I've read all the relevant doc's and posts that I can find on
>>> this, but I'm still scratching my head about what I am doing wrong to
>>> generate the value.
>>>
>>>
>>>
>>> -g
>>>
>>>
>>
>

Mime
View raw message