Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5364917BF1 for ; Mon, 1 Jun 2015 21:34:58 +0000 (UTC) Received: (qmail 63295 invoked by uid 500); 1 Jun 2015 21:34:58 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 63250 invoked by uid 500); 1 Jun 2015 21:34:57 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 63240 invoked by uid 99); 1 Jun 2015 21:34:57 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 21:34:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 68ED0C0DFB for ; Mon, 1 Jun 2015 21:34:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id vzr3gHXTCCqq for ; Mon, 1 Jun 2015 21:34:44 +0000 (UTC) Received: from mail-yh0-f54.google.com (mail-yh0-f54.google.com [209.85.213.54]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id BA73C23131 for ; Mon, 1 Jun 2015 21:34:43 +0000 (UTC) Received: by yhan67 with SMTP id n67so15470767yha.3 for ; Mon, 01 Jun 2015 14:33:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=N21JypCOBzf96IJ+gn5Wiz+TJYlxUiedztP7oYZNNpk=; b=iIdyhnQ7T3H0osVcbAyu9YpMpvv0kSvlohS+5pNHCx+hiJ35/ebU1v7dM5+U0Pb+f1 Jt008ipIWc3OsCiB37HgqDynesVHjLjVubheLaSo07s6Hr8BmEC1AKrXe08II58cKtil poQhOkPmdlEPar5k2ZU3CReCBzjqGddtqA7552cvrqQCqyW1xn7gd0qJnOo0efvUOLxc dtTwlmaVtZLyi/W6AgHBppPUJ7MYpP988DYiQDAJjyUTqmLBUXOsOUfKiqTgI43oEb7Y Kk917ZnHS/rqjit6rsY1q8C7YkWEZnok9qRX7f/znO20EJ33Sn8fsM+JF8SHGJkBDRji 8hbA== X-Gm-Message-State: ALoCoQmJCE1nWWe6NZptHgtaql/1uifrcuaVYXaOH/660PzmgFutT83ZnAXDvLB3Tx3HN0oBgiaJ X-Received: by 10.170.76.5 with SMTP id s5mr285920yks.93.1433194437573; Mon, 01 Jun 2015 14:33:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.196.197 with HTTP; Mon, 1 Jun 2015 14:33:37 -0700 (PDT) In-Reply-To: References: From: Josh Wills Date: Mon, 1 Jun 2015 17:33:37 -0400 Message-ID: Subject: Re: Iteratable Bug? To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=001a113a820e58924505177b95cd --001a113a820e58924505177b95cd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Just did a run of the following test and everything works fine-- however, I did get the exception you saw when I had a version of my code that called it.next() twice inside of the while loop. I noticed that the version of the MergedAttrMapping MapFn that you sent in your original email doesn't compile (i.e., the "I" variable is capitalized), so I'm wondering if the real code contained the same mistake as I just made (i.e., I called it.next() again, after I had verified that "i" wasn't null). import com.google.common.collect.ImmutableList; import org.apache.crunch.impl.mr.MRPipeline; import org.apache.crunch.test.TemporaryPath; import org.apache.crunch.test.TemporaryPaths; import org.apache.crunch.types.avro.Avros; import org.apache.hadoop.conf.Configuration; import org.junit.Rule; import org.junit.Test; import java.util.Iterator; import java.util.List; public class NullIterableIT { @Rule public TemporaryPath tmpDir =3D TemporaryPaths.create(); @Test public void testNullIterables() throws Exception { run(tmpDir.getDefaultConfiguration()); } public static void run(Configuration conf) { Pipeline p =3D new MRPipeline(NullIterableIT.class, conf); List> in =3D ImmutableList.of( Pair.of("a", 1), Pair.of("b", 1), Pair.of("b", 2), Pair.of("c", (Integer) null)); PTable input =3D p.create(in, Avros.tableOf(Avros.strings(), Avros.ints())); input.groupByKey().mapValues(new MapFn, Integer>() { @Override public Integer map(Iterable input) { int sum =3D 0; Iterator it =3D input.iterator(); while (it.hasNext()) { Integer i =3D it.next(); if (i !=3D null) { sum +=3D i; } } return sum; } }, Avros.ints()).materialize().iterator(); p.done(); } } On Mon, Jun 1, 2015 at 5:00 PM, David Ortiz wrote: > Here are the steps the pipeline goes through between the join and this > map fn: > > > > =C2=B7 Outer Join yielding PTable> > > =C2=B7 MapFn which outputs each value entry from the join, with th= e > fields reversed creating PTable > > =C2=B7 group by key > > =C2=B7 MapFn in question > > > > *From:* Josh Wills [mailto:jwills@cloudera.com] > *Sent:* Monday, June 01, 2015 4:53 PM > > *To:* user@crunch.apache.org > *Subject:* Re: Iteratable Bug? > > > > Okay...I think I need some more context around what is preceding this > function. What is everything that happens between the outer join (using a > JoinStrategy?) and this MapFn call? > > > > On Mon, Jun 1, 2015 at 4:47 PM, David Ortiz > wrote: > > I can=E2=80=99t say for sure it was an empty one that threw the exceptio= n, but > that is processing the output of an outer join, so they definitely exist. > > > > *From:* Josh Wills [mailto:jwills@cloudera.com] > *Sent:* Monday, June 01, 2015 4:42 PM > > > *To:* user@crunch.apache.org > *Subject:* Re: Iteratable Bug? > > > > I can't replicate it easily in master-- is the iterable in question empty > by any chance? > > > > On Mon, Jun 1, 2015 at 2:34 PM, David Ortiz > wrote: > > This is 0.11.0-cdh5.3.2 > > > > *From:* Josh Wills [mailto:jwills@cloudera.com] > *Sent:* Monday, June 01, 2015 2:33 PM > > > *To:* user@crunch.apache.org > *Subject:* Re: Iteratable Bug? > > > > Yeah, that's odd. This is 0.12? Let me see if I can reproduce it. > > > > J > > > > On Mon, Jun 1, 2015 at 2:31 PM, David Ortiz > wrote: > > Hello Josh, > > > > Sorry, it is the next() that is throwing the exception. > > > > Thanks, > > Dave > > > > *From:* Josh Wills [mailto:jwills@cloudera.com] > *Sent:* Monday, June 01, 2015 2:30 PM > *To:* user@crunch.apache.org > *Subject:* Re: Iteratable Bug? > > > > Hey David, > > > > It seems like it. Which line in the function is throwing the exception? I= s > it the hasNext(), or the next()? > > > > J > > > > On Mon, Jun 1, 2015 at 2:07 PM, David Ortiz > wrote: > > Hello everyone, > > > > I noticed the following does not work in my pipeline: > > > > @Override > *public *MergedAttrMapping map(Pair> > attrMappingIterablePair) { > MergedAttrMapping out =3D *mapper*.map(attrMappingIterablePair.first()= ); > StringBuilder ids =3D *new *StringBuilder(); > > Iterator it =3D attrMappingIterablePair.second().iterator(); > while (it.hasNext()) { > > Integer I =3D it.next(); > > *if *(i !=3D *null *&& i !=3D 0) { > ids.append(i); > ids.append(*'|'*); > } > } > > *if *(ids.length() > 0) { > ids.deleteCharAt(ids.length() -1); > } > > out.setIds(ids.toString()); > > *return *out; > } > > > > Causing the following exception: > > > > Error: java.util.NoSuchElementException: iterate past last value at > org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(Red= uceContextImpl.java:235) > at > org.apache.crunch.types.avro.AvroPairConverter$AvroWrappedIterable$1.next= (AvroPairConverter.java:103) > at > org.apache.crunch.types.PGroupedTableType$HoldLastIterator.next(PGroupedT= ableType.java:84) > at com.videologygroup.crunch.FteWarehouse$1.map(FteWarehouse.java:268) at > com.videologygroup.crunch.FteWarehouse$1.map(FteWarehouse.java:257) at > org.apache.crunch.MapFn.process(MapFn.java:34) at > org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at > org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitt= er.java:56) > at org.apache.crunch.MapFn.process(MapFn.java:34) at > org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at > org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:113) at > org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:57) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > > > but when I change it to this (change highlighted in yellow): > > > > @Override > *public *MergedAttrMapping map(Pair> attrM= appingIterablePair) { > MergedAttrMapping out =3D *mapper*.map(attrMappingIterablePair.first()= ); > StringBuilder dmpAttrs =3D *new *StringBuilder(); > > *for *(Integer i : attrMappingIterablePair.second()) { > *if *(i !=3D *null *&& i !=3D 0) { > ids.append(i); > ids.append(*'|'*); > } > } > > *if *(ids.length() > 0) { > ids.deleteCharAt(ids.length() -1); > } > > out.setIds(ids.toString()); > > *return *out; > } > > > > It does. > > > > Is this a bug? > > > > Thanks, > > Dave Ortiz > > *This email is intended only for the use of the individual(s) to whom it > is addressed. If you have received this communication in error, please > immediately notify the sender and delete the original email.* > > > > > > -- > > Director of Data Science > > Cloudera > > Twitter: @josh_wills > > *This email is intended only for the use of the individual(s) to whom it > is addressed. If you have received this communication in error, please > immediately notify the sender and delete the original email.* > > > > > > -- > > Director of Data Science > > Cloudera > > Twitter: @josh_wills > > *This email is intended only for the use of the individual(s) to whom it > is addressed. If you have received this communication in error, please > immediately notify the sender and delete the original email.* > > > > > > -- > > Director of Data Science > > Cloudera > > Twitter: @josh_wills > > *This email is intended only for the use of the individual(s) to whom it > is addressed. If you have received this communication in error, please > immediately notify the sender and delete the original email.* > > > > > > -- > > Director of Data Science > > Cloudera > > Twitter: @josh_wills > *This email is intended only for the use of the individual(s) to whom > it is addressed. If you have received this communication in error, please > immediately notify the sender and delete the original email.* > --=20 Director of Data Science Cloudera Twitter: @josh_wills --001a113a820e58924505177b95cd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Just did a run of the following test and everything works = fine-- however, I did get the exception you saw when I had a version of my = code that called it.next() twice inside of the while loop. I noticed that t= he version of the MergedAttrMapping MapFn that you sent in your original em= ail doesn't compile (i.e., the "I" variable is capitalized), = so I'm wondering if the real code contained the same mistake as I just = made (i.e., I called it.next() again, after I had verified that "i&quo= t; wasn't null).

import com.google.common.collect.ImmutableList;
import org.apache.c= runch.impl.mr.MRPipeline;
import org.apache.crunch.test.TemporaryPath;
import org.apache.crunch.test.= TemporaryPaths;
impo= rt org.apache.crunch.types.avro.Avros;
import org.apache.hadoop.conf.Configuratio= n;
import org.junit.Rule;
import org.junit.Test;

import java.util.Iterator;
import java.util.List;

public class NullIte= rableIT {

@Rule
public TemporaryPath tmpDir =3D TemporaryPaths.create();

@Test
public void testNullI= terables() throws
Exception {
run(tmpDir.getDefaultConfi= guration());
}

public static void run(Configuration conf) {
Pipeline p= =3D new MRPipel= ine(NullIterableIT.clas= s, conf);
List<Pair<String, Integer>> in =3D ImmutableList.of(
Pair.of("a", 1), Pair= .of("b", 1), Pair.of("b", 2), Pair.o= f("c"<= /span>, (Integer) null<= /span>));
PTable<String, Integer> input =3D p.create(in, Avros= .tableOf(Avros.strings(), Avros.in= ts()));
input.groupByKey().mapValues(new MapFn<Iterable<Integer>, In= teger>() {
@Override
public Integer map(Iterable<Integ= er> input) {
int sum =3D 0;
= Iterator<Integer> it =3D input.iterator();
while (it.hasNext()) {
= Integer i =3D it.next();
if (i !=3D null) {
sum +=3D i;
}=
}
return sum;
}
}, Avros.ints()).materialize().iterator();
p.done();
}
= }



On Mon, Jun 1, 2015 at 5:00 PM, David Ortiz= <dortiz@videologygroup.com> wrote:

Here are the steps the pipeline goes = through between the join and this map fn:

=C2=A0

=C2=B7=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Outer Join yielding PTable<In= teger, Pair<Integer, AttrMapping>>

=C2=B7=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 MapFn which outputs each value e= ntry from the join, with the fields reversed creating PTable<AttrMapping= , Integer>

=C2=B7=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 group by key

=C2=B7=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 MapFn in question<= /span>

=C2=A0

From: Josh Wills [mailto:jwills@cloudera.com]
Sent: Monday, June 01, 2015 4:53 PM


To: user= @crunch.apache.org
Subject: Re: Iteratable Bug?

=C2=A0

Okay...I think I need some more context around what = is preceding this function. What is everything that happens between the out= er join (using a JoinStrategy?) and this MapFn call?

=C2=A0

On Mon, Jun 1, 2015 at 4:47 PM, David Ortiz <dortiz@videology= group.com> wrote:

I can=E2=80=99t say for sure it was a= n empty one that threw the exception, but that is processing the output of an outer join, so they definitely exist.

=C2=A0

From: Josh Wills [mailto:jwills@cloudera.com]
Sent: Monday, June 01, 2015 4:42 PM


To: user= @crunch.apache.org
Subject: Re: Iteratable Bug?

=C2=A0

I can't replicate it easily in master-- is the i= terable in question empty by any chance?

=C2=A0

On Mon, Jun 1, 2015 at 2:34 PM, David Ortiz <dortiz@videology= group.com> wrote:

This is 0.11.0-cdh5.3.2=

=C2=A0

From: Josh Wills [mailto:jwills@cloudera.com]
Sent: Monday, June 01, 2015 2:33 PM


To: user= @crunch.apache.org
Subject: Re: Iteratable Bug?

=C2=A0

Yeah, that's odd. This is 0.12? Let me see if I = can reproduce it.

=C2=A0

J

=C2=A0

On Mon, Jun 1, 2015 at 2:31 PM, David Ortiz <dortiz@videology= group.com> wrote:

Hello Josh,

=C2=A0

=C2=A0=C2=A0 Sorry, it is the next() = that is throwing the exception.

=C2=A0

Thanks,

=C2=A0=C2=A0=C2=A0=C2=A0 Dave<= u>

=C2=A0

From: Josh Wills [mailto:jwills@cloudera.com]
Sent: Monday, June 01, 2015 2:30 PM
To: user= @crunch.apache.org
Subject: Re: Iteratable Bug?

=C2=A0

Hey David,

=C2=A0

It seems like it. Which line in the function is thro= wing the exception? Is it the hasNext(), or the next()?

=C2=A0

J

=C2=A0

On Mon, Jun 1, 2015 at 2:07 PM, David Ortiz <dortiz@videology= group.com> wrote:

Hello everyone,

=C2=A0

=C2=A0=C2=A0=C2=A0=C2=A0 I noticed the following doe= s not work in my pipeline:

=C2=A0

@Override
public MergedAttrMapping map(Pair<AttrMapping, Iterable<Inte= ger>> attrMappingIterablePair) {
=C2=A0=C2=A0 MergedAttrMapping out =3D
mapper.map(attrMappingIterablePair.first());
=C2=A0=C2=A0 StringBuilder ids =3D new StringBuilder();

=C2=A0=C2=A0 Iterator<Integer> it =3D attrMappingIterablePair.se= cond().iterator();
=C2=A0=C2=A0 while (it.hasNext()) {

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Integer I =3D it.next();=C2=A0

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (i !=3D null && i !=3D 0) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ids.append(i);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ids.append(
= '|');
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0 }

=C2=A0=C2=A0
if (ids.length() > 0) {
=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0ids.deleteCharAt(ids.length() -
= 1);
=C2=A0=C2=A0 }

=C2=A0=C2=A0 out.setIds(ids.toString());

=C2=A0=C2=A0
return out;
}

=C2=A0

Causing the following exception:

=C2=A0

Error: java.util.NoSuchElementException: iterate past la= st value at org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterato= r.next(ReduceContextImpl.java:235) at org.apache.crunch.types.avro.AvroPairConverter$AvroWrappedIterable$1.ne= xt(AvroPairConverter.java:103) at org.apache.crunch.types.PGroupedTableType= $HoldLastIterator.next(PGroupedTableType.java:84) at com.videologygroup.cru= nch.FteWarehouse$1.map(FteWarehouse.java:268) at com.videologygroup.crunch.FteWarehouse$1.map(FteWarehouse.java:257) at = org.apache.crunch.MapFn.process(MapFn.java:34) at org.apache.crunch.impl.mr= .run.RTNode.process(RTNode.java:98) at org.apache.crunch.impl.mr.emit.Inter= mediateEmitter.emit(IntermediateEmitter.java:56) at org.apache.crunch.MapFn.process(MapFn.java:34) at org.apache.crunch.imp= l.mr.run.RTNode.process(RTNode.java:98) at org.apache.crunch.impl.mr.run.RT= Node.processIterable(RTNode.java:113) at org.apache.crunch.impl.mr.run.Crun= chReducer.reduce(CrunchReducer.java:57) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache= .hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.= hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapr= ed.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.sec= urity.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.Use= rGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop= .mapred.YarnChild.main(YarnChild.java:163)

=C2=A0

but when I change it to this (change highlighted in = yellow):

=C2=A0

@Override
public <= /span>MergedAttrMapping map= (Pair<AttrMapping, Iterable<Integer>> attrMappingIterablePair) = {
=C2=A0=C2=A0 MergedAttrMapping out =3D
mapper.map(attrMappingIterablePair.first());
=C2=A0=C2=A0 StringBui= lder dmpAttrs =3D
new = StringBuilder();
=
=C2=A0=C2=A0
for (Integer i : attrMappingIterablePair.second()) {
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0
if (i !=3D null && i !=3D 0) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0
ids.append(i);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0
ids.append(= '|');
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0 }


=C2=A0=C2=A0 if (ids.length() > 0) {=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0ids.deleteCharAt(ids.length() -1);
=C2=A0=C2=A0 }

=C2=A0=C2=A0 out.setIds(id= s.toString());

=C2=A0=C2=A0
return = out;
}

=C2=A0

It does.

=C2=A0

Is this a bug?

=C2=A0

Thanks,

=C2=A0=C2=A0=C2=A0=C2=A0Dave Ortiz

This email is intended only for the use of the in= dividual(s) to whom it is addressed. If you have received this communicatio= n in error, please immediately notify the sender and delete the original email.



=C2=A0

--

Director of Data Science

Twitter: @josh_wills=

This email is intended only for the use of the in= dividual(s) to whom it is addressed. If you have received this communicatio= n in error, please immediately notify the sender and delete the original email.



=C2=A0

--

Director of Data Science

Twitter: @josh_wills=

This email is intended only for the use of the in= dividual(s) to whom it is addressed. If you have received this communicatio= n in error, please immediately notify the sender and delete the original email.



=C2=A0

--

Director of Data Science

Twitter: @josh_wills=

This email is intended only for the use of the in= dividual(s) to whom it is addressed. If you have received this communicatio= n in error, please immediately notify the sender and delete the original em= ail.



=C2=A0

--

Director of Data Science

Twitter: @josh_wills

This email is intended only for the use of the individual(s) to whom it = is addressed. If you have received this communication in error, please imme= diately notify the sender and delete the original email.



--
Director of Data Science
Twitter: @josh_wills
--001a113a820e58924505177b95cd--