hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abhinit <abhinit.ku...@gmail.com>
Subject Re: implementing join on two Hbase tables
Date Sat, 06 Dec 2008 00:16:55 GMT
Thanks, I will have a look at what you mentioned.
I have another question. In Pig Latin data analysis tasks are expressed as
Pig Latin has join and cogroup operators which does the task using
on hadoop. Can anyone share how does Pig Latin implementation do it?


On Fri, Dec 5, 2008 at 1:34 PM, Jonathan Gray <jlist@streamy.com> wrote:

> I'm not aware of anything that is completely equipped for the task, however
> this could be done more simply with one of the Hadoop MapReduce tools.
> My personal favorite is Cascading (http://www.cascading.org) by Chris
> Wensel.  This can help you with doing something like reading in two
> different tables from two different Maps and bringing them together.
> Unfortunately, there is not yet an HBase Tap.  If you're interested in
> developing one, I have been told that it should not be difficult.  Check
> out
> #cascading on freenode and you should be able to get some help.  If you go
> down that route, please let me know because I'm interested in an HBase Tap
> as well but have not had the time to work on it.
> Hive and Pig are other projects that help with this, but they also do not
> have HBase hooks yet (that I'm aware of).
> You might also consider something like Pigi (http://www.pigi-project.org),
> which is an ORM.  It supports indexing and searching, unsure if there are
> any mechanisms for joins available or planned.
> Otherwise, you'll need to write your own jobs.  You'd need probably three
> different MR jobs.  Two that Map from each of the HBase tables you're
> interested in.  Then another job that would read from combined output of
> those two jobs and perform the join.  You might use the Map->Reduce sort
> step to perform the join if possible, depends on the details of what you
> want to do.  If you go down this path, you can certainly get plenty of help
> from this list or the IRC channel #hbase as this would be very useful to
> the
> community.
> JG
> > -----Original Message-----
> > From: abhinit [mailto:abhinit.kumar@gmail.com]
> > Sent: Friday, December 05, 2008 2:32 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: implementing join on two Hbase tables
> >
> > I am trying to implement hash-join and nested join on two Hbase tables.
> > However, I am stuck.
> >
> > I came across the package *org.apache.hadoop.mapred.join* which joins
> > two sorted datasets before map. However, I want to implement joins
> > using
> > map/reduce methods so that I have more control on how to join the data.
> >
> > I found the package *org.apache.hadoop.contrib.utils.join* after a bit
> > of
> > searching
> > which has something I am looking for (not too sure as I have not read
> > the
> > code completely).
> > It would be great if someone who has used this package can give me a
> > pointer
> > on my problem,
> >
> > Is there a way I can take two tables as input in TableMap's map method?
> > (my
> > guess is no)
> > If not, does the current hadoop/hbase implementation provide features
> > for
> > implementing user-defined joins
> >
> > Thanks a lot
> > -Abhinit


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message