hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Hadoop Data Sharing
Date Tue, 11 May 2010 13:38:18 GMT
Thanks Aaron! I was thinking the same after doing some reading.
Man what about serialize the objects? Would you think that is a good idea?
Thanks again.

Renato M.


2010/5/5 Aaron Kimball <aaron@cloudera.com>

> Renato,
>
> In general if you need to perform a multi-pass MapReduce workflow, each
> pass
> materializes its output to files. The subsequent pass then reads those same
> files back in as input. This allows the workflow to start at the last
> "checkpoint" if it gets interrupted. There is no persistent in-memory
> distributed storage feature in Hadoop that would allow a MapReduce job to
> post results to memory for consumption by a subsequent job.
>
> So you would just read your initial data from /input, and write your
> interim
> results to /iteration0. Then the next pass reads from /iteration0 and
> writes
> to /iteration1, etc..
>
> If your data is reasonably small and you think it could fit in memory
> somewhere, then you could experiment with using other distributed key-value
> stores (memcached[b], hbase, cassandra, etc..) to hold intermediate
> results.
> But this will require some integration work on your part.
> - Aaron
>
> On Wed, May 5, 2010 at 8:29 AM, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
> > Hi everyone, I have recently started to play around with hadoop, but I am
> > getting some into some "design" problems.
> > I need to make a loop to execute the same job several times, and in each
> > iteration get the processed values (not using a file because I would need
> > to
> > read it). I was using an static vector in my main class (the one that
> > iterates and executes the job in each iteration) to retrieve those
> values,
> > and it did work while I was using a standalone mode. Now I tried to test
> it
> > on a pseudo-distributed manner and obviously is not working.
> > Any suggestions, please???
> >
> > Thanks in advance,
> >
> >
> > Renato M.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message