hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: job taking input file, which "is being" written by its preceding job's map phase
Date Sat, 11 Feb 2012 13:54:34 GMT

On Sat, Feb 11, 2012 at 12:21 PM, Vamshi Krishna <vamshi2105@gmail.com> wrote:
> Hi harsh, i am trying to find what are all the rowkeys present in two
> tables. If userid is the rowKey for two different tables, i want to find all
> those rowsKeys present in both thae tables. Fo that i need to read from two
> tables into a mapreduce job. i.e i want to take multiple tables as input to
> a mapreduce job, so that i can check for the intersection.  How can i do
> that?

You should probably revisit your schema to eke out a better design if
you've come to a point where joins are required - two tables carrying
same rowkeys seems like doing it wrong (depends). Try going over the
schema design portions of "HBase: The Definitive Guide", its a good

> One more doubt i have is, if two jobs have  Htable=new HTable(config, "HT");
> (HT is the hbasetable i have created) in their respective maps, and  these
> two jobs reading from other tables T1,T2 and putting into HT table, will
> there be any problem??

No, there shouldn't be a problem but the process may be slow (you're
doing a join of sorts).

> Caused by: org.apache.hadoop.hbase.TableNotFoundException:
> HsetSIintermediate

Reg. your stacktrace: Apparently one of your requested tables do not
exist yet. ^^

Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

View raw message