Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of geoffry.roberts@gmail.com
 designates 209.85.160.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=eFO4e3bB2jqSxy/hHzkhGAD4YJ5FJIJ4w3ixoHMZ8vZrzXFGx6Mqs/EdUJZMVjAt35
         6DHiCMW1w+V8pvLg2+EajtkFNW3LKc2w51t9KirnRuBD5KCNAct9nGFinzx1Azo8Mct+
         GoUu440PIvsJ6GyBcuZ6lb5H84hyTKkUC9fkQ=
MIME-Version: 1.0
Date: Fri, 6 May 2011 10:55:44 -0700
Message-ID: <BANLkTinKzOXL6szwb-P_gcXPNYpYXPK7jQ@mail.gmail.com>
Subject: Multiple Outputs Not Being Written to File
From: Geoffry Roberts <geoffry.roberts@gmail.com>
To: mapreduce-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001636284c44f1126f04a29f3283

--001636284c44f1126f04a29f3283
Content-Type: text/plain; charset=ISO-8859-1

All,

I am attempting to take a large file and split it up into a series of
smaller files.  I want the smaller files to be named based on values taken
from the large file.  I am using
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to do this.

The job runs without error and produces a set of files as expected and each
file is named as expected.  But most of the files are empty.  Apparently, no
data was written to them.  The fact that the file was created at all should
confirm that there was data coming in from the mapper.  When my reducer
counts as it iterates through the values then logs the count.  I am seeing
reasonable counts in my logs.  The number of lines in an output file should
equal the count.   I have counts but no lines.

What could be causing this?

My Mapper:
protected void map(LongWritable key, Text value, Context ctx) throws
IOException,
            InterruptedException {
        String[] ss = value.toString().split(",");
        String locale = ss[F.DEPARTURE_LOCALE];
        ctx.write(new Text(locale), value);
    }

My Reducer:
private MultipleOutputs<Text, Text> mos;

@Override
 protected void setup(Context ctx) throws IOException, InterruptedException
{
        mos = new MultipleOutputs<Text, Text>(ctx);
 }

    @Override
    protected void reduce(Text key, Iterable<Text> values, Context ctx)
            throws IOException, InterruptedException {
        int k = 0;
        /*
         * The key at this point can have blanks and slashes. Let us get rid
         * of both.
         */
        String blankless = key.toString().replace(' ', '+');
        String path = blankless.toString().replace("/", "");
        try {
            for (Text value : values) {
                k++;
                String[] ss = value.toString().split(F.DELIMITER);
                String id = ss[F.ID];
                String[] sslessid = Arrays.copyOfRange(ss, 1, ss.length);
                String line = UT.array2String(sslessid);

// An output file is being created,
                mos.write(new Text(id), new Text(line), path);
            }
        } catch (NullPointerException e) {
            LOG.error("<br/>" + "blankless=" + blankless);
            LOG.error("<br/>" + "values=" + values.toString());
        }

// In my logs, I see reasonable counts even when the output file is empty.
        LOG.info("<br/>key=" + path + " count=" + k);
    }
-- 
Geoffry Roberts

--001636284c44f1126f04a29f3283
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

All,<br><br>I am attempting to take a large file and split it up into a ser=
ies of smaller files.=A0 I want the smaller files to be named based on valu=
es taken from the large file.=A0 I am using org.apache.hadoop.mapreduce.lib=
.output.MultipleOutputs to do this.<br>
<br>The job runs without error and produces a set of files as expected and =
each file is named as expected.=A0 But most of the files are empty.=A0 Appa=
rently, no data was written to them.=A0 The fact that the file was created =
at all should confirm that there was data coming in from the mapper.=A0 Whe=
n my reducer counts as it iterates through the values then logs the count.=
=A0 I am seeing reasonable counts in my logs.=A0 The number of lines in an =
output file should equal the count.=A0=A0 I have counts but no lines.<br>
<br>What could be causing this?<br><br>My Mapper:<br>protected void map(Lon=
gWritable key, Text value, Context ctx) throws IOException,<br>=A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 InterruptedException {<br>=A0=A0=A0 =A0=A0=A0 String[] =
ss =3D value.toString().split(&quot;,&quot;);<br>
=A0=A0=A0 =A0=A0=A0 String locale =3D ss[F.DEPARTURE_LOCALE];<br>=A0=A0=A0 =
=A0=A0=A0 ctx.write(new Text(locale), value);<br>=A0=A0=A0 }<br><br>My Redu=
cer:<br>private MultipleOutputs&lt;Text, Text&gt; mos;<br><br>@Override<br>=
=A0protected void setup(Context ctx) throws IOException, InterruptedExcepti=
on {<br>
=A0=A0=A0 =A0=A0=A0 mos =3D new MultipleOutputs&lt;Text, Text&gt;(ctx);<br>=
=A0}<br><br>=A0=A0=A0 @Override<br>
=A0=A0=A0 protected void reduce(Text key, Iterable&lt;Text&gt; values, Cont=
ext ctx)<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, InterruptedEx=
ception {<br>=A0=A0=A0 =A0=A0=A0 int k =3D 0;<br>=A0=A0=A0 =A0=A0=A0 /*<br>=
=A0=A0=A0 =A0=A0=A0 =A0* The key at this point can have blanks and slashes.=
 Let us get rid<br>

=A0=A0=A0 =A0=A0=A0 =A0* of both.<br>=A0=A0=A0 =A0=A0=A0 =A0*/=A0=A0=A0 =A0=
=A0=A0 <br>=A0=A0=A0 =A0=A0=A0 String blankless =3D key.toString().replace(=
&#39; &#39;, &#39;+&#39;);<br>=A0=A0=A0 =A0=A0=A0 String path =3D blankless=
.toString().replace(&quot;/&quot;, &quot;&quot;);<br>=A0=A0=A0 =A0=A0=A0 tr=
y {<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for (Text value : values) {<br>=A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 k++;<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
String[] ss =3D value.toString().split(F.DELIMITER);<br>=A0=A0=A0 =A0=A0=A0=
 =A0=A0=A0 =A0=A0=A0 String id =3D ss[<a href=3D"http://F.ID" target=3D"_bl=
ank">F.ID</a>];<br>
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String[] sslessid =3D Arrays.copyOf=
Range(ss, 1, ss.length);<br>
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String line =3D UT.array2String(ssl=
essid);<br><br>// An output file is being created, <br>=A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 mos.write(new Text(id), new Text(line), path);<br>=A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 } catch (NullPointerExc=
eption e) {<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LOG.error(&quot;&lt;br/&gt;&quot; + &quot;bla=
nkless=3D&quot; + blankless);<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LOG.error(&q=
uot;&lt;br/&gt;&quot; + &quot;values=3D&quot; + values.toString());<br>=A0=
=A0=A0 =A0=A0=A0 }<br><br>// In my logs, I see reasonable counts even when =
the output file is empty.<br>

=A0=A0=A0 =A0=A0=A0 LOG.info(&quot;&lt;br/&gt;key=3D&quot; + path + &quot; =
count=3D&quot; + k);<br>=A0=A0=A0 }<br>-- <br>Geoffry Roberts<br><br>

--001636284c44f1126f04a29f3283--