hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Venugopal" <...@andrew.cmu.edu>
Subject Re: MR use case where each reducer/mapper receives different parameters
Date Wed, 13 Aug 2008 21:45:35 GMT
Also, just to clarify a couple of points:
I am using Hadoop On Demand, which means that to run a job, I first have to
allocate a cluster, I am using the "hod script" mechanism, where the cluster
is allocated for running time of my hod script. If my script could schedule
multiple MR jobs, but then only relinquish control when all jobs are done, I
could simply schedule one MR per parameter setting.

Ashish


On Wed, Aug 13, 2008 at 2:08 PM, Ashish Venugopal <arv@andrew.cmu.edu>wrote:

> Hi, I need to implement a specific use case that comes up often in the
> machine learning / nlp community. Often we want to run some kind of
> optimization process on a data set, but we want to run the optimization at
> several different initial parameters. While this is not the usual MR
> paradigm of splitting up a large task and then recombining the partial
> outputs, I would like to use Hadoop to handle the parallelization.
> It mentions on the streaming documentation page (
> http://hadoop.apache.org/core/docs/current/streaming.html), that streaming
> can be used to create jobs with multiple different parameters - but does not
> give any example, so its not clear to me how to give each mapper (or each
> reducer), a specific set of parameters. If each mapper/reducer had access
> some kind of job index number, i could potentially write a side file which
> maps ids->params, but this seems clumsy.
> The only solution that I have now, is that my mapper phase will replicate
> the data, pairing it with a set of keys that represent different parameters.
> Then each reducer will see a key-value pair, by reading the key its can get
> its parameters, and the value has the data. Any other solutions?
>
> Thanks!
>
> Ashish
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message