hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kilbride, James P." <James.Kilbr...@gd-ais.com>
Subject RE: MapReduce HBASE examples
Date Tue, 06 Jul 2010 16:38:51 GMT
This is an interesting start but I'm really interested in the opposite direction, where hbase
is the input to my map reduce job, and then I'm going to push some data into reducers which
ultimately I'm okay with them just writing it to a file.

I get the impression that I need to set up a TableInputFormat type of object. But I since
job only allows you to do setInputFormatClass, I'm not sure how to dynamically configure the
inputFormatClass to accept some parameters to limit the input format scan on the table to
only specific rows. Here's the general thrust of what I'm trying to do with MapReduce and

I have a table called People which has rows of people(names, ids, whatever is used for identifying
a person in the system). That table also has a column family called relatives where the column
ids are the names of relatives for the person. I want to pass into the inputFormat object
the names of the people I want it to look up. And the mapper should get the persons name as
the key and the columnFamily relatives as the value(that's the result of the scan limitations
I'm putting into place). 

I then will retrieve the relatives(in the map function), look at relationships between them
and push onto the context the relatives name(keyOut) and a floating point value(valueOut).
The reducer will combine all these floating point values for each relative and output(in a
file is fine) the relatives name and cumulative score.

But I can't seem to figure out how to set up a job that uses the TableInputFormat I want,
and which also allows me to set the parameter for it so that it will only give me the people
I ask for when I run the program not the entire table. 

Does this make any sense?

James Kilbride

-----Original Message-----
From: Harsh J [mailto:qwertymaniac@gmail.com] 
Sent: Tuesday, July 06, 2010 12:10 PM
To: general@hadoop.apache.org
Subject: Re: MapReduce HBASE examples

I believe this article will help you understand the new (not anymore)
API+HBase MR: http://kdpeterson.net/blog/2009/09/minimal-hbase-mapreduce-example.html
[Look at the second example, which uses the Put object]

On Tue, Jul 6, 2010 at 6:08 PM, Kilbride, James P.
<James.Kilbride@gd-ais.com> wrote:
> All,
> The examples in the hbase examples, and on the hadoop wiki all reference the deprecated
interfaces of the mapred package. Are there any examples of how to use hbase as the input
for a mapreduce job, that uses the mapreduce package instead? I'm looking to set up a job
which will read from a hbase table based on a row value passed into the job, and which starts
the map the row values(as the value key) and the column names(or value) as the map values.
> James Kilbride

Harsh J

View raw message