hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Side-loading output from one MR into another?
Date Thu, 23 Aug 2012 05:27:52 GMT
If it is a small set, you can load it onto distributed cache and then
onto the task's memory, or if its pretty big, perhaps you can do a
map-side join?

On Thu, Aug 23, 2012 at 10:12 AM, Michael Parker
<michael.g.parker@gmail.com> wrote:
> Hi all,
>
> Is it possible to take a collection of sorted key-value pairs,
> generated from one MapReduce, and side-load them into another
> MapReduce, i.e. as it runs, the second MapReduce can look up the value
> for a given key computed by the first MapReduce?
>
> I need this for a cohort study -- one MR puts users into cohorts, and
> the second MR needs that user-to-cohort mapping to see how cohorts
> behave over time.
>
> Any help would be greatly appreciated. Thanks!
>
> - Mike



-- 
Harsh J

Mime
View raw message