hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Question about MapReduce
Date Thu, 15 Oct 2009 21:26:30 GMT
Comments inline


On Thu, Oct 15, 2009 at 2:20 PM, Something Something <luckyguy2050@yahoo.com
> wrote:

> I have 3 HTables.... Table1, Table2 & Table3.
> I have 3 different flat files.  One contains keys for Table1, 2nd contains
> keys for Table2 & 3rd contains keys for Table3.
>
> Use case:  For every combination of these 3 keys, I need to perform some
> complex calculation and save the result in another HTable.  In other words,
> I need to calculate values for the following combos:
>
> (1,1,1) (1,1,2).......   (1,1,N) (1,2,1) (1,3,1) & so on....
>
> So I figured the best way to do this is to start a MapReduce Job for each
> of these combinations.


Maybe not.. It'll depend on how many keys you are inputting from the file.
MapReduce is good if you have a lot of repetitive task to be done.


> The MapReduce will get (Key1, Key2, Key3) as input, then read Table1,
> Table2 & Table3 with these keys and perform the calculations.  Is this the
> correct approach?  If it is, I need to pass Key1, Key2 & Key3 to the Mapper
> & Reducer.  What's the best way to do this?
>

If you have do this using MR, here's what I'd recommend:

Write a MR job that creates combinations of keys and store them in a flat
file.
eg: File A has a1,a2,a3; File B has b1,b2,b3; File C has c1,c2,c3
Now create a fourth file which contains combinations like
a1,b1,c1
a1,b1,c2
.
.
.
a2,b1,c1
.
.

So on and so forth.

Then, have another MR job that scans this file, and reads the three keys
from the three tables respectively and does the computation you want to.

By the way, I'm not clear why you are using HBase here. Just wondering...
Can you explain?


>
> At this time, I don't need to join these tables in MapReduce, but in future
> I might have to.
>
>
Joins will be a different game altogether. We'll see when you get to it.


> Thanks.
>
>
>
>
> ________________________________
> From: Kevin Peterson <kpeterson@biz360.com>
> To: hbase-user@hadoop.apache.org
> Sent: Thu, October 15, 2009 11:39:22 AM
> Subject: Re: Question about MapReduce
>
> On Thu, Oct 15, 2009 at 11:30 AM, Something Something <
> luckyguy2050@yahoo.com> wrote:
>
> > 1) I don't think TableInputFormat is useful in this case.  Looks like
> it's
> > used for scanning columns from a single HTable.
> > 2) TableMapReduceUtil - same problem.  Seems like this works with just
> one
> > table.
> > 3) JV recommended NLineInputFormat, but my parameters are not in a file.
> >  They come from multiple files and are in memory.
> >
> > I guess what I am looking for is something like... InMemoryInputFormat...
> > similar to FileInputFormat & DbInputFormat.  There's no such class right
> > now.
> >
> > Worse comes to worst, I can write the parameters into a flat file, and
> use
> > FileInputFormat - but that will slow down this process considerably.  Is
> > there no other way?
> >
> > So you need to pull input from multiple tables at once? Are you expecting
> to do a join on these tables? If you explain what the data looks like, we'd
> understand better. What are your tables, and what would you like to treat
> as
> a single input record?
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message