Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E21E810AD2 for ; Thu, 24 Oct 2013 17:20:38 +0000 (UTC) Received: (qmail 12017 invoked by uid 500); 24 Oct 2013 17:20:12 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 11713 invoked by uid 500); 24 Oct 2013 17:19:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 11700 invoked by uid 99); 24 Oct 2013 17:19:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 17:19:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.223.173] (HELO mail-ie0-f173.google.com) (209.85.223.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 17:19:46 +0000 Received: by mail-ie0-f173.google.com with SMTP id u16so4431841iet.18 for ; Thu, 24 Oct 2013 10:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=s1mbi0se.com.br; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=WmVMTDL8aT9G7QdvxahYRcE+VuIKnKPqohPB6p7EiHk=; b=K7Z+JioySmHcrvlSlMXPQQwWTzQg/khcGJX0D1+pRzRo7ZovxndKePeKa6q0nR4CMl yCqZhIULANof5clooC8cl0wkMC8gSaChysdulGrfQoyom0aenzGdu49HWjDQw9XmaKjA eGKAjDDXIoLbn5rYJKAIPYRhULF5v1MBUxeHU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=WmVMTDL8aT9G7QdvxahYRcE+VuIKnKPqohPB6p7EiHk=; b=kgxnAUM2Fk8+EXDzBcAoT/KDkoAqufqX5uvPd4qKwoJHD2t/c+Ln4M/sq16uf4Wf6T a6EYaAF8lc86t0SQ6Ht7vQkOuRBXNHA8958oEolf2qZW7MXTcXDDn4/MuITPghW4zwBv ihYmyToB5F4WMZkrM8/NuKWjNOFPMq+80yLkCqw9coFZV8hxKLLshR8B4P7Pwl281vWn 1lr/Oqk9chdmUzTSGIY2meoST/JmzUC6eYVmiyUijGhYOxGIP0wp5ovr+pdlYWTBJB6/ 6CAZdjoS1TE1HeTxFJhgSWPu6eyfqZDaDqYxCjPTnPHvAwE+tcftqCnNeJsj/RWvP1zO oaSQ== X-Gm-Message-State: ALoCoQlkr+4RTZmuxKO5GnxDUwO5AfzVMiYCB/N1v0cA19sc7IRhuHE85tqTU3H59uvvhKViD3ra MIME-Version: 1.0 X-Received: by 10.43.49.8 with SMTP id uy8mr1091961icb.73.1382635164008; Thu, 24 Oct 2013 10:19:24 -0700 (PDT) Received: by 10.64.6.36 with HTTP; Thu, 24 Oct 2013 10:19:23 -0700 (PDT) X-Originating-IP: [189.5.31.133] Date: Thu, 24 Oct 2013 15:19:23 -0200 Message-ID: Subject: NullPointerException when trying to write mapper output From: Marcelo Elias Del Valle To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec52994efce1aad04e97fd5a0 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec52994efce1aad04e97fd5a0 Content-Type: text/plain; charset=ISO-8859-1 I am using hadoop 1.0.3 at Amazon EMR. I have a map / reduce job configured like this: private static final String TEMP_PATH_PREFIX = System.getProperty("java.io.tmpdir") + "/dmp_processor_tmp"; ... private Job setupProcessorJobS3() throws IOException, DataGrinderException { String inputFiles = System.getProperty(DGConfig.INPUT_FILES); Job processorJob = new Job(getConf(), PROCESSOR_JOBNAME); processorJob.setJarByClass(DgRunner.class); processorJob.setMapperClass(EntityMapperS3.class); processorJob.setReducerClass(SelectorReducer.class); processorJob.setOutputKeyClass(Text.class); processorJob.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(processorJob, new Path(TEMP_PATH_PREFIX)); processorJob.setOutputFormatClass(TextOutputFormat.class); processorJob.setInputFormatClass(NLineInputFormat.class); FileInputFormat.setInputPaths(processorJob, inputFiles); NLineInputFormat.setNumLinesPerSplit(processorJob, 10000); return processorJob; } In my mapper class, I have: private Text outkey = new Text(); private Text outvalue = new Text(); ... outkey.set(entity.getEntityId().toString()); outvalue.set(input.getId().toString()); printLog("context write"); context.write(outkey, outvalue); This last line (`context.write(outkey, outvalue);`), causes this exception. Of course both `outkey` and `outvalue` are not null. 2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 context write 2013-10-24 05:48:48,422 ERROR com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Error on entitymapper for input: 03a07858-4196-46dd-8a2c-23dd824d6e6e java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1293) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1210) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244) at org.apache.hadoop.io.Text.write(Text.java:281) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:698) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.s1mbi0se.grinder.core.mapred.EntityMapper.map(EntityMapper.java:78) at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:34) at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:14) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) 2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperS3 (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 Entity Mapper end The first records on each task are just processed ok. In some point of the task processing, I start to take this exception over and over, and then it doesn't process a single record anymore for that task. I tried to set `TEMP_PATH_PREFIX` to `"s3://mybucket/dmp_processor_tmp"`, but same thing happened. Any idea why is this happening? What could be making hadoop not being able to write on it's output? --bcaec52994efce1aad04e97fd5a0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I am using hadoop 1.0.3 at Amazon EMR. I have a map /= reduce job configured like this:

private static final String TEMP_PATH_PR= EFIX =3D System.getProperty("java.io.tmpdir") + "/dmp_proces= sor_tmp";
...
private Job setupProcessorJob= S3() throws IOException, DataGrinderException {
String inputFiles =3D System.getProperty= (DGConfig.INPUT_FILES);
Job processorJob = =3D new Job(getConf(), PROCESSOR_JOBNAME);
processorJob.setJarByClass(DgRunner.class);
processorJob.setMapperC= lass(EntityMapperS3.class);
processorJob.setReducerClass(SelectorReducer.class);
processorJob.setOutp= utKeyClass(Text.class);
processorJob.setOu= tputValueClass(Text.class);
FileOutputFormat.setOutputPath(processorJob, new Path(TEMP_P= ATH_PREFIX));
processorJob.setOu= tputFormatClass(TextOutputFormat.class);
processorJob.setInputFormatClass(NLineInputFormat.class);=
FileInputFormat.se= tInputPaths(processorJob, inputFiles);
NLineInputFormat.setNumLinesPerSplit(processorJob= , 10000);
return processorJob;
}

<= div>In my mapper class, I have:

priv= ate Text outkey =3D new Text();
private Text outvalue =3D new Text();
...
outkey.set(entity.g= etEntityId().toString());
outvalue.set(input.getId().toString());
printLog("context write")= ;
context.write(outke= y, outvalue);

This last line (`context.write(outke= y, outvalue);`), causes this exception. Of course both `outkey` and `outval= ue` are not null.=A0

=A0 =A0 2013-10-24 05:48:48,422 INFO com.s1mbi0se.grind= er.core.mapred.EntityMapperCassandra (main): Current Thread: Thread[main,5,= main]Current timestamp: 1382593728422 context write
=A0 =A0 2013-= 10-24 05:48:48,422 ERROR com.s1mbi0se.grinder.core.mapred.EntityMapperCassa= ndra (main): Error on entitymapper for input: 03a07858-4196-46dd-8a2c-23dd8= 24d6e6e
=A0 =A0 java.lang.NullPointerException
=A0 =A0 at java.lang.System.arraycopy(Nativ= e Method)
=A0 =A0 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTas= k.java:1293)
=A0 =A0 at org.apac= he.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1210)
=A0 =A0 at java.= io.DataOutputStream.writeByte(DataOutputStream.java:153)
=A0 =A0 at org.apac= he.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
=A0= =A0 at org.apache.hadoo= p.io.WritableUtils.writeVInt(WritableUtils.java:244)
=A0 =A0 at org.apac= he.hadoop.io.Text.write(Text.java:281)
=A0 =A0 at org.apache.hadoop.io.serializer.Writabl= eSerialization$WritableSerializer.serialize(WritableSerialization.java:90)<= /div>
=A0 =A0 at org.apac= he.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(= WritableSerialization.java:77)
=A0 =A0 at org.apache.hadoop.mapred.MapTask$MapOutputBuffe= r.collect(MapTask.java:1077)
=A0 =A0 at org.apac= he.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:698)
=A0 =A0 at org.apache= .hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:= 80)
=A0 =A0 at com.s1mb= i0se.grinder.core.mapred.EntityMapper.map(EntityMapper.java:78)
= =A0 =A0 at com.s1mbi0se.= grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:34)
=A0 =A0 at com.s1mb= i0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:14)
=A0 =A0 at org.apache= .hadoop.mapreduce.Mapper.run(Mapper.java:144)
=A0 =A0 at org.apac= he.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
=A0 =A0 <= span class=3D"" style=3D"white-space:pre"> at org.apache.hadoop.mapr= ed.MapTask.run(MapTask.java:375)
=A0 =A0 at org.apac= he.hadoop.mapred.Child$4.run(Child.java:255)
=A0 =A0 at java.security.AccessController.d= oPrivileged(Native Method)
=A0 =A0 at javax.se= curity.auth.Subject.doAs(Subject.java:415)
=A0 =A0 at org.apache.hadoop.security.UserGrou= pInformation.doAs(UserGroupInformation.java:1132)
=A0 =A0 at org.apac= he.hadoop.mapred.Child.main(Child.java:249)
=A0 =A0 2013-10-24 05= :48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperS3 (main): Cur= rent Thread: Thread[main,5,main]Current timestamp: 1382593728422 Entity Map= per end

The first records on each task are just processed ok. I= n some point of the task processing, I start to take this exception over an= d over, and then it doesn't process a single record anymore for that ta= sk.

I tried to set `TEMP_PATH_PREFIX` to `"s3://mybuck= et/dmp_processor_tmp"`, but same thing happened.

<= div>Any idea why is this happening? What could be making hadoop not being a= ble to write on it's output?=A0 --bcaec52994efce1aad04e97fd5a0--