hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Parker <michael.g.par...@gmail.com>
Subject Re: Side-loading output from one MR into another?
Date Thu, 23 Aug 2012 22:57:32 GMT
Actually, I was able to do some tricks and reduce the size to
something that can be held in memory.

Nonetheless, if anyone has an example of or more information about a
map-side join, I would love to see it.


- Mike

On Wed, Aug 22, 2012 at 11:57 PM, Michael Parker
<michael.g.parker@gmail.com> wrote:
> Thanks for the prompt reply!
> Unfortunately, it's not that small.
> I'm using the new API; are map side joins accomplished using
> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/contrib/utils/join/package-summary.html?
> Are there any examples which use this package or map side joins?
> The way I was thinking of doing it was to output the user-to-cohort
> mapping from the first MR as a SequenceFile, and then each mapper in
> the second MR could use a SequenceFile.Reader to find the cohort for a
> user. It seems reasonable, but is this actually doable? It's like a
> manual map-side join, I suppose, although likely not as elegant as
> what you were proposing.
> Thanks,
> Mike
> On Wed, Aug 22, 2012 at 10:27 PM, Harsh J <harsh@cloudera.com> wrote:
>> If it is a small set, you can load it onto distributed cache and then
>> onto the task's memory, or if its pretty big, perhaps you can do a
>> map-side join?
>> On Thu, Aug 23, 2012 at 10:12 AM, Michael Parker
>> <michael.g.parker@gmail.com> wrote:
>>> Hi all,
>>> Is it possible to take a collection of sorted key-value pairs,
>>> generated from one MapReduce, and side-load them into another
>>> MapReduce, i.e. as it runs, the second MapReduce can look up the value
>>> for a given key computed by the first MapReduce?
>>> I need this for a cohort study -- one MR puts users into cohorts, and
>>> the second MR needs that user-to-cohort mapping to see how cohorts
>>> behave over time.
>>> Any help would be greatly appreciated. Thanks!
>>> - Mike
>> --
>> Harsh J

View raw message