hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: Programming Multiple rounds of mapreduce
Date Mon, 13 Jun 2011 22:13:28 GMT
Thanks Matt,

Arko, if you plan to use Oozie, you can have a simple coordinator job that
does does, for example (the following schedules a WF every 5 mins that
consumes the output produced by the previous run, you just have to have the
initial data)

Thxs.

Alejandro

----
<coordinator-app name="coord-1" frequency="${coord:minutes(5)}"
start="${start}" end="${end}" timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.1">
  <controls>
    <concurrency>1</concurrency>
  </controls>

  <datasets>
    <dataset name="data" frequency="${coord:minutes(5)}"
initial-instance="${start}" timezone="UTC">

<uri-template>${nameNode}/user/${coord:user()}/examples/${dataRoot}/${YEAR}-${MONTH}-${DAY}-${HOUR}-${MINUTE}
      </uri-template>
    </dataset>
  </datasets>

  <input-events>
    <data-in name="input" dataset="data">
      <instance>${coord:current(0)}</instance>
    </data-in>
  </input-events>

  <output-events>
    <data-out name="output" dataset="data">
      <instance>${coord:current(1)}</instance>
    </data-out>
  </output-events>

  <action>
    <workflow>

<app-path>${nameNode}/user/${coord:user()}/examples/apps/subwf-1</app-path>
      <configuration>
        <property>
          <name>jobTracker</name>
          <value>${jobTracker}</value>
        </property>
        <property>
          <name>nameNode</name>
          <value>${nameNode}</value>
        </property>
        <property>
          <name>queueName</name>
          <value>${queueName}</value>
        </property>
        <property>
          <name>examplesRoot</name>
          <value>${examplesRoot}</value>
        </property>
        <property>
          <name>inputDir</name>
          <value>${coord:dataIn('input')}</value>
        </property>
        <property>
          <name>outputDir</name>
          <value>${coord:dataOut('output')}</value>
        </property>
      </configuration>
    </workflow>
  </action>
</coordinator-app>
------

On Mon, Jun 13, 2011 at 3:01 PM, GOEKE, MATTHEW (AG/1000) <
matthew.goeke@monsanto.com> wrote:

> If you know for certain that it needs to be split into multiple work units
> I would suggest looking into Oozie. Easy to install, light weight, low
> learning curve... for my purposes it's been very helpful so far. I am also
> fairly certain you can chain multiple job confs into the same run but I have
> not actually tried that therefore I can't promise it is easy or possible.
>
> http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-oozie/
>
> If you are not running CDH3u0 then you can also get the tarball and
> documentation directly here:
> https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
>
> Matt
>
> -----Original Message-----
> From: Marcos Ortiz [mailto:mlortiz@uci.cu]
> Sent: Monday, June 13, 2011 4:57 PM
> To: mapreduce-user@hadoop.apache.org
> Cc: Arko Provo Mukherjee
> Subject: Re: Programming Multiple rounds of mapreduce
>
> Well, you can define a job for each round and then, you can define the
> running workflow based in your implementation and to chain your jobs
>
> El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribió:
> > Hello,
> >
> > I am trying to write a program where I need to write multiple rounds
> > of map and reduce.
> >
> > The output of the last round of map-reduce must be fed into the input
> > of the next round.
> >
> > Can anyone please guide me to any link / material that can teach me as
> > to how I can achieve this.
> >
> > Thanks a lot in advance!
> >
> > Thanks & regards
> > Arko
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer (UCI)
>  http://marcosluis2186.posterous.com
>  http://twitter.com/marcosluis2186
>
>
> This e-mail message may contain privileged and/or confidential information,
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use
> of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>
>

Mime
View raw message