hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <luckyguy2...@yahoo.com>
Subject Re: Question about MapReduce
Date Thu, 15 Oct 2009 21:20:45 GMT
I have 3 HTables.... Table1, Table2 & Table3.
I have 3 different flat files.  One contains keys for Table1, 2nd contains keys for Table2
& 3rd contains keys for Table3.

Use case:  For every combination of these 3 keys, I need to perform some complex calculation
and save the result in another HTable.  In other words, I need to calculate values for the
following combos:

(1,1,1) (1,1,2).......   (1,1,N) (1,2,1) (1,3,1) & so on....

So I figured the best way to do this is to start a MapReduce Job for each of these combinations.
 The MapReduce will get (Key1, Key2, Key3) as input, then read Table1, Table2 & Table3
with these keys and perform the calculations.  Is this the correct approach?  If it is, I
need to pass Key1, Key2 & Key3 to the Mapper & Reducer.  What's the best way to do

At this time, I don't need to join these tables in MapReduce, but in future I might have to.


From: Kevin Peterson <kpeterson@biz360.com>
To: hbase-user@hadoop.apache.org
Sent: Thu, October 15, 2009 11:39:22 AM
Subject: Re: Question about MapReduce

On Thu, Oct 15, 2009 at 11:30 AM, Something Something <
luckyguy2050@yahoo.com> wrote:

> 1) I don't think TableInputFormat is useful in this case.  Looks like it's
> used for scanning columns from a single HTable.
> 2) TableMapReduceUtil - same problem.  Seems like this works with just one
> table.
> 3) JV recommended NLineInputFormat, but my parameters are not in a file.
>  They come from multiple files and are in memory.
> I guess what I am looking for is something like... InMemoryInputFormat...
> similar to FileInputFormat & DbInputFormat.  There's no such class right
> now.
> Worse comes to worst, I can write the parameters into a flat file, and use
> FileInputFormat - but that will slow down this process considerably.  Is
> there no other way?
> So you need to pull input from multiple tables at once? Are you expecting
to do a join on these tables? If you explain what the data looks like, we'd
understand better. What are your tables, and what would you like to treat as
a single input record?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message