Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5AB5D3FD0 for ; Fri, 6 May 2011 17:56:12 +0000 (UTC) Received: (qmail 51049 invoked by uid 500); 6 May 2011 17:56:11 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 50961 invoked by uid 500); 6 May 2011 17:56:11 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 50953 invoked by uid 99); 6 May 2011 17:56:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 17:56:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of geoffry.roberts@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 17:56:06 +0000 Received: by gya6 with SMTP id 6so1790430gya.35 for ; Fri, 06 May 2011 10:55:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=IpU60tQhL8UDLDerKJ2Z+mpgGdNn+0NInZXXvBOplkU=; b=IeUqTLq4ZVCo0Nn1teL7lSGi1b5d6l3wyHYuuSPYhwlyIyG+tzvx1+gt0wqreqsRPX LQHr9S+XViClcdurFMuwblqQPXcB016Z/gPzAyT+P9YqPNyD3/gzCUgZ8Yt/s5w5yTEU GGmlyPhh0tRucqdcYm4BbiAwWCAnaBbEEw8L0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=eFO4e3bB2jqSxy/hHzkhGAD4YJ5FJIJ4w3ixoHMZ8vZrzXFGx6Mqs/EdUJZMVjAt35 6DHiCMW1w+V8pvLg2+EajtkFNW3LKc2w51t9KirnRuBD5KCNAct9nGFinzx1Azo8Mct+ GoUu440PIvsJ6GyBcuZ6lb5H84hyTKkUC9fkQ= MIME-Version: 1.0 Received: by 10.90.188.16 with SMTP id l16mr3552195agf.88.1304704544999; Fri, 06 May 2011 10:55:44 -0700 (PDT) Received: by 10.90.79.6 with HTTP; Fri, 6 May 2011 10:55:44 -0700 (PDT) Date: Fri, 6 May 2011 10:55:44 -0700 Message-ID: Subject: Multiple Outputs Not Being Written to File From: Geoffry Roberts To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636284c44f1126f04a29f3283 --001636284c44f1126f04a29f3283 Content-Type: text/plain; charset=ISO-8859-1 All, I am attempting to take a large file and split it up into a series of smaller files. I want the smaller files to be named based on values taken from the large file. I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to do this. The job runs without error and produces a set of files as expected and each file is named as expected. But most of the files are empty. Apparently, no data was written to them. The fact that the file was created at all should confirm that there was data coming in from the mapper. When my reducer counts as it iterates through the values then logs the count. I am seeing reasonable counts in my logs. The number of lines in an output file should equal the count. I have counts but no lines. What could be causing this? My Mapper: protected void map(LongWritable key, Text value, Context ctx) throws IOException, InterruptedException { String[] ss = value.toString().split(","); String locale = ss[F.DEPARTURE_LOCALE]; ctx.write(new Text(locale), value); } My Reducer: private MultipleOutputs mos; @Override protected void setup(Context ctx) throws IOException, InterruptedException { mos = new MultipleOutputs(ctx); } @Override protected void reduce(Text key, Iterable values, Context ctx) throws IOException, InterruptedException { int k = 0; /* * The key at this point can have blanks and slashes. Let us get rid * of both. */ String blankless = key.toString().replace(' ', '+'); String path = blankless.toString().replace("/", ""); try { for (Text value : values) { k++; String[] ss = value.toString().split(F.DELIMITER); String id = ss[F.ID]; String[] sslessid = Arrays.copyOfRange(ss, 1, ss.length); String line = UT.array2String(sslessid); // An output file is being created, mos.write(new Text(id), new Text(line), path); } } catch (NullPointerException e) { LOG.error("
" + "blankless=" + blankless); LOG.error("
" + "values=" + values.toString()); } // In my logs, I see reasonable counts even when the output file is empty. LOG.info("
key=" + path + " count=" + k); } -- Geoffry Roberts --001636284c44f1126f04a29f3283 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable All,

I am attempting to take a large file and split it up into a ser= ies of smaller files.=A0 I want the smaller files to be named based on valu= es taken from the large file.=A0 I am using org.apache.hadoop.mapreduce.lib= .output.MultipleOutputs to do this.

The job runs without error and produces a set of files as expected and = each file is named as expected.=A0 But most of the files are empty.=A0 Appa= rently, no data was written to them.=A0 The fact that the file was created = at all should confirm that there was data coming in from the mapper.=A0 Whe= n my reducer counts as it iterates through the values then logs the count.= =A0 I am seeing reasonable counts in my logs.=A0 The number of lines in an = output file should equal the count.=A0=A0 I have counts but no lines.

What could be causing this?

My Mapper:
protected void map(Lon= gWritable key, Text value, Context ctx) throws IOException,
=A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 InterruptedException {
=A0=A0=A0 =A0=A0=A0 String[] = ss =3D value.toString().split(",");
=A0=A0=A0 =A0=A0=A0 String locale =3D ss[F.DEPARTURE_LOCALE];
=A0=A0=A0 = =A0=A0=A0 ctx.write(new Text(locale), value);
=A0=A0=A0 }

My Redu= cer:
private MultipleOutputs<Text, Text> mos;

@Override
= =A0protected void setup(Context ctx) throws IOException, InterruptedExcepti= on {
=A0=A0=A0 =A0=A0=A0 mos =3D new MultipleOutputs<Text, Text>(ctx);
= =A0}

=A0=A0=A0 @Override
=A0=A0=A0 protected void reduce(Text key, Iterable<Text> values, Cont= ext ctx)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, InterruptedEx= ception {
=A0=A0=A0 =A0=A0=A0 int k =3D 0;
=A0=A0=A0 =A0=A0=A0 /*
= =A0=A0=A0 =A0=A0=A0 =A0* The key at this point can have blanks and slashes.= Let us get rid
=A0=A0=A0 =A0=A0=A0 =A0* of both.
=A0=A0=A0 =A0=A0=A0 =A0*/=A0=A0=A0 =A0= =A0=A0
=A0=A0=A0 =A0=A0=A0 String blankless =3D key.toString().replace(= ' ', '+');
=A0=A0=A0 =A0=A0=A0 String path =3D blankless= .toString().replace("/", "");
=A0=A0=A0 =A0=A0=A0 tr= y {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for (Text value : values) {
=A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 k++;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = String[] ss =3D value.toString().split(F.DELIMITER);
=A0=A0=A0 =A0=A0=A0= =A0=A0=A0 =A0=A0=A0 String id =3D ss[F.ID];
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String[] sslessid =3D Arrays.copyOf= Range(ss, 1, ss.length);
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String line =3D UT.array2String(ssl= essid);

// An output file is being created,
=A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 mos.write(new Text(id), new Text(line), path);
=A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 } catch (NullPointerExc= eption e) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LOG.error("<br/>" + "bla= nkless=3D" + blankless);
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 LOG.error(&q= uot;<br/>" + "values=3D" + values.toString());
=A0= =A0=A0 =A0=A0=A0 }

// In my logs, I see reasonable counts even when = the output file is empty.
=A0=A0=A0 =A0=A0=A0 LOG.info("<br/>key=3D" + path + " = count=3D" + k);
=A0=A0=A0 }
--
Geoffry Roberts

--001636284c44f1126f04a29f3283--