Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0FD939576 for ; Thu, 23 Aug 2012 05:28:47 +0000 (UTC) Received: (qmail 26002 invoked by uid 500); 23 Aug 2012 05:28:42 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 25816 invoked by uid 500); 23 Aug 2012 05:28:42 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 25795 invoked by uid 99); 23 Aug 2012 05:28:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 05:28:41 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 05:28:33 +0000 Received: by obbtb18 with SMTP id tb18so1001803obb.35 for ; Wed, 22 Aug 2012 22:28:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=dFwc0ODP0Tfcp+5m81afR0b0qzQNrKfxO1Bu2JSbcNM=; b=DYXAVj8Q+kJoIWMAtIUbx8JHxgkbQRuJifuxj1qu9f5oeoEtyLBeoHizitto4o5T8X rDW9uto1l7ck53+HV5uRIFgD9lBvRj0a5p7TCx77O3UnDnYX9eXfH7sSnFyur66XNftH R1TEhxSxPYdSLDyNEFXkF+pMq2Pft2gYwAf6UBTffkiFgO/j/ziVYIEkEzuGgVQxfrtN hgs6E2Uz25UJrQiWRwyPvLoQcRCl5ejZ1fEUNHw5S7r4Xz5ii3rWXjE/3cjQZ3Lu2xAC 9r3SPeo81ipfyp1TFldFPXvI4TCTQ5E8rVI7h1YeM22wT9059fz3i0j5xqUvQntIJv+o gu/A== Received: by 10.182.216.99 with SMTP id op3mr152816obc.85.1345699692550; Wed, 22 Aug 2012 22:28:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.11.168 with HTTP; Wed, 22 Aug 2012 22:27:52 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Thu, 23 Aug 2012 10:57:52 +0530 Message-ID: Subject: Re: Side-loading output from one MR into another? To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkbakX612oWsO2YruELl+wp1SQWcpN2AX0K1gGF+LOaAjaVPhi1GNzdlRBuQN0sZTqS07oK If it is a small set, you can load it onto distributed cache and then onto the task's memory, or if its pretty big, perhaps you can do a map-side join? On Thu, Aug 23, 2012 at 10:12 AM, Michael Parker wrote: > Hi all, > > Is it possible to take a collection of sorted key-value pairs, > generated from one MapReduce, and side-load them into another > MapReduce, i.e. as it runs, the second MapReduce can look up the value > for a given key computed by the first MapReduce? > > I need this for a cohort study -- one MR puts users into cohorts, and > the second MR needs that user-to-cohort mapping to see how cohorts > behave over time. > > Any help would be greatly appreciated. Thanks! > > - Mike -- Harsh J