Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 96023 invoked from network); 6 May 2010 01:04:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 May 2010 01:04:56 -0000 Received: (qmail 44520 invoked by uid 500); 6 May 2010 01:04:55 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 44477 invoked by uid 500); 6 May 2010 01:04:55 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 44469 invoked by uid 99); 6 May 2010 01:04:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 May 2010 01:04:55 +0000 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=AWL,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.176] (HELO mail-px0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 May 2010 01:04:48 +0000 Received: by pxi10 with SMTP id 10so2958287pxi.35 for ; Wed, 05 May 2010 18:04:28 -0700 (PDT) Received: by 10.141.90.17 with SMTP id s17mr6407317rvl.207.1273107868132; Wed, 05 May 2010 18:04:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.169.13 with HTTP; Wed, 5 May 2010 18:04:08 -0700 (PDT) In-Reply-To: References: From: Aaron Kimball Date: Wed, 5 May 2010 18:04:08 -0700 Message-ID: Subject: Re: Hadoop Data Sharing To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd118263da5870485e28698 --000e0cd118263da5870485e28698 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Renato, In general if you need to perform a multi-pass MapReduce workflow, each pas= s materializes its output to files. The subsequent pass then reads those same files back in as input. This allows the workflow to start at the last "checkpoint" if it gets interrupted. There is no persistent in-memory distributed storage feature in Hadoop that would allow a MapReduce job to post results to memory for consumption by a subsequent job. So you would just read your initial data from /input, and write your interi= m results to /iteration0. Then the next pass reads from /iteration0 and write= s to /iteration1, etc.. If your data is reasonably small and you think it could fit in memory somewhere, then you could experiment with using other distributed key-value stores (memcached[b], hbase, cassandra, etc..) to hold intermediate results= . But this will require some integration work on your part. - Aaron On Wed, May 5, 2010 at 8:29 AM, Renato Marroqu=EDn Mogrovejo < renatoj.marroquin@gmail.com> wrote: > Hi everyone, I have recently started to play around with hadoop, but I am > getting some into some "design" problems. > I need to make a loop to execute the same job several times, and in each > iteration get the processed values (not using a file because I would need > to > read it). I was using an static vector in my main class (the one that > iterates and executes the job in each iteration) to retrieve those values= , > and it did work while I was using a standalone mode. Now I tried to test = it > on a pseudo-distributed manner and obviously is not working. > Any suggestions, please??? > > Thanks in advance, > > > Renato M. > --000e0cd118263da5870485e28698--