hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: questions on usage of output collector
Date Tue, 25 Nov 2008 18:05:21 GMT
Nishant Khurana wrote:
> Hi,
> I think that works for me. What I meant below was that the Output Collector
> of TableReduce only collects ImmutableBytesWritable and BatchUpdate type of
> key and value. 

Yeah.  These are the types needed by HTable doing its commit so it would 
seem to make sense that this is what should come out of the reduce step.

Looking at code, it doesn't look like this would be easy to change; 
TableReduce is just an interface but bulk of the insert work is done in 
TableOutputFormat.  If you want to work on making TOF take other types, 
just say and we can try and work it through together (TOF would need to 
be made more generic).

> So I was asking how can I use other datatypes while writing
> back to Hbase tables using TableReduce. But seems either my mapper or
> reducer has to convert my datatypes into above mentioned using the method
> you suggested and them pass it on to output collector of table reduce.
> Let me know if I am missing something.

Yeah, IBW and BU are what you need to make doing your hbase insert.

> Also, how can I store multiple values for the same column in Hbase. Like a
> movie id containing 5 genres all coming under column genre. My mapper was
> extracting a comma separated list of genres from a text file for a movie id
> and separating it to produce id, genre pair. Then I passed them to reducer
> to add it to table but BatchUpdate seem to overwrite the previous entries by
> the last one. Can I store all values in the same column ?
Well, as long as they are all emitted from the mapper with the same key, 
they should all be showing up in the reduce globbed together by the key 
with each of the attributes Iterable.  How are you doing your map emissions?

For example, if emit from your map with a key of movieid (say as Text or 
as IBW) and then each of the genres as values (again as either IBW or 
Text), then your reducer should be passed a key of movied and then an 
Iterator over the Text or IBW of genres.

You'd then in your reducer create a BatchUpdate and do 
BU.put("genre:genretype", genrevalue).... and convert the key to IBW if 
not already and emit this from your reduce?

Pardon me if I'm stating what you already know.


> Thanks
> On Mon, Nov 24, 2008 at 1:50 PM, stack <stack@duboce.net> wrote:
>> Nishant Khurana wrote:
>>> Hi,
>>> I was writing a mapreduce class to read from a text file and write the
>>> entries to a table. My Map function reads each line and outputs a key and
>>> a
>>> MapWritable as value. I was wondering while writing reduce using
>>> TableReduce, how to convert the key (IntWritable) to
>>> ImmutableBytesWritable
>>> and Mapwritable object to BatchUpdate so that my outputcollector doesn't
>>> complain in reduce function. It seems to enforce the signature where it
>>> collects the above two datatypes only.
>> For the key, would something like the below work for you:
>> // Let 'key' be the IntWritable passed to the reduce. key.get() returns an
>> int.
>> // Bytes has a bunch of overrides for different types returning byte [].
>> ImmutableBytesWritable ibw = new
>> ImmutableBytesWritable(Bytes.toBytes(key.get()));
>> For the MapWritable to BatchUpdate, how about:
>>       // Again, let 'key' but the passed IntValue key.  To make a byte
>> array of it,
>>       // use, Bytes.toBytes.
>>       BatchUpdate bu = new BatchUpdate(Bytes.toBytes(key.get()));
>>       // Let 'v' be the MapWritable passed to this reduce.
>>       while (v.hasNext()) {
>>         HbaseMapWritable<SomeWritable, SomeWritable> hmw = v.next();
>>         for (Entry<SomeWritable, SomeWritable> e: hmw.entrySet()) {
>>           bu.put(Bytes.toBytes(e.get()), Bytes.toBytes(e.get()));
>>         }
>>       }
>> For 0.19.0 hbase, there is an example that does similar to what you are up
>> to under src/examples/mapred though I think it might depend on a recent fix
>> to HbaseMapWritable that allowed it take byte array as value, not just
>> Writables.
>>  Also I believe I can only use above two datatypes while using table reduce
>>> but couldn't understand them very well. How can I convert any datatype to
>>> the above two to write them to the tables.
>> Please say more.  I don't think I follow exactly (And would like to fix
>> this for 0.19.0 if its what I think you are saying).
>> St.Ack

View raw message