hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Some Body" <someb...@squareplanet.de>
Subject MultipleOutputs or Partitioner
Date Mon, 10 May 2010 12:08:29 GMT
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//DE"><HTML><HEAD><META
HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"><TITLE>Message</TITLE></HEAD><BODY>Hi,<br><br>I'm
trying to understand how to generate multiple outputs in my reducer (using 0.20.2+228).<br>Do
I need MultipleOutput or should I partition my output in the mapper?<br><br>My
reducer currently gets key/val input pairs like this which all end up in my part_r_0000 file.<br><br>&nbsp;&nbsp;&nbsp;
hostA_VarX_2010-05-01_morning&nbsp;&nbsp;&nbsp; &lt;FLOATVAL&gt;<br>&nbsp;&nbsp;&nbsp;
hostA_VarY_2010-05-01_morning&nbsp;&nbsp;&nbsp; &lt;FLOATVAL&gt;<br>&nbsp;&nbsp;&nbsp;
hostA_VarX_2010-05-01_afternoon &nbsp;&nbsp; &lt;FLOATVAL&gt;<br>
&nbsp;&nbsp;&nbsp; hostA_VarY_2010-05-01_afternoon&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>&nbsp;&nbsp;&nbsp; .....<br>&nbsp;&nbsp;&nbsp;
hostB_VarX_2010-05-01_morning&nbsp;&nbsp;&nbsp; &lt;FLOATVAL&gt;<br>
&nbsp;&nbsp;&nbsp; hostB_VarY_2010-05-01_morning&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>
&nbsp;&nbsp;&nbsp; hostB_VarX_2010-05-01_afternoon &nbsp;&nbsp; &lt;FLOATVAL&gt;<br>

&nbsp;&nbsp;&nbsp; hostB_VarY_2010-05-01_afternoon&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>&nbsp;&nbsp;&nbsp; .....<br>
&nbsp;&nbsp;&nbsp; hostA_VarX_2010-05-02_morning&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>
&nbsp;&nbsp;&nbsp; hostA_VarY_2010-05-02_morning&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>
&nbsp;&nbsp;&nbsp; hostA_VarX_2010-05-02_afternoon &nbsp;&nbsp; &lt;FLOATVAL&gt;<br>

&nbsp;&nbsp;&nbsp; hostA_VarY_2010-05-02_afternoon&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>&nbsp;&nbsp;&nbsp; .....<br>
&nbsp;&nbsp;&nbsp; hostB_VarX_2010-05-02_morning&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>

&nbsp;&nbsp;&nbsp; hostB_VarY_2010-05-02_morning&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>

&nbsp;&nbsp;&nbsp; hostB_VarX_2010-05-02_afternoon &nbsp;&nbsp; &lt;FLOATVAL&gt;<br>


&nbsp;&nbsp;&nbsp; hostB_VarY_2010-05-02_afternoon&nbsp;&nbsp;&nbsp;
&lt;FLOATVAL&gt;<br>&nbsp;&nbsp;&nbsp; .....<br>


<br>But instead of 1 output file I want one output file per day/group. e.g.<br>&nbsp;&nbsp;&nbsp;
2010-05-01_morning.txt<br>&nbsp;&nbsp;&nbsp; 2010-05-01_afternoon.txt<br><br>Each
&lt;date&gt;_&lt;time&gt;.txt file would contain all keys/vals for all hosts
&amp; VarNames <br><br>Thanks,<br>Alan</BODY></HTML>
Mime
View raw message