crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Iteratable Bug?
Date Mon, 01 Jun 2015 21:33:37 GMT
Just did a run of the following test and everything works fine-- however, I
did get the exception you saw when I had a version of my code that called
it.next() twice inside of the while loop. I noticed that the version of the
MergedAttrMapping MapFn that you sent in your original email doesn't
compile (i.e., the "I" variable is capitalized), so I'm wondering if the
real code contained the same mistake as I just made (i.e., I called
it.next() again, after I had verified that "i" wasn't null).

import com.google.common.collect.ImmutableList;
import org.apache.crunch.impl.mr.MRPipeline;
import org.apache.crunch.test.TemporaryPath;
import org.apache.crunch.test.TemporaryPaths;
import org.apache.crunch.types.avro.Avros;
import org.apache.hadoop.conf.Configuration;
import org.junit.Rule;
import org.junit.Test;

import java.util.Iterator;
import java.util.List;

public class NullIterableIT {

  @Rule
  public TemporaryPath tmpDir = TemporaryPaths.create();

  @Test
  public void testNullIterables() throws Exception {
    run(tmpDir.getDefaultConfiguration());
  }

  public static void run(Configuration conf) {
    Pipeline p = new MRPipeline(NullIterableIT.class, conf);
    List<Pair<String, Integer>> in = ImmutableList.of(
            Pair.of("a", 1), Pair.of("b", 1), Pair.of("b", 2),
Pair.of("c", (Integer) null));
    PTable<String, Integer> input = p.create(in,
Avros.tableOf(Avros.strings(), Avros.ints()));
    input.groupByKey().mapValues(new MapFn<Iterable<Integer>, Integer>() {
      @Override
      public Integer map(Iterable<Integer> input) {
        int sum = 0;
        Iterator<Integer> it = input.iterator();
        while (it.hasNext()) {
          Integer i = it.next();
          if (i != null) {
            sum += i;
          }
        }
        return sum;
      }
    }, Avros.ints()).materialize().iterator();
    p.done();
  }
}




On Mon, Jun 1, 2015 at 5:00 PM, David Ortiz <dortiz@videologygroup.com>
wrote:

>  Here are the steps the pipeline goes through between the join and this
> map fn:
>
>
>
> ·        Outer Join yielding PTable<Integer, Pair<Integer, AttrMapping>>
>
> ·        MapFn which outputs each value entry from the join, with the
> fields reversed creating PTable<AttrMapping, Integer>
>
> ·        group by key
>
> ·        MapFn in question
>
>
>
> *From:* Josh Wills [mailto:jwills@cloudera.com]
> *Sent:* Monday, June 01, 2015 4:53 PM
>
> *To:* user@crunch.apache.org
> *Subject:* Re: Iteratable Bug?
>
>
>
> Okay...I think I need some more context around what is preceding this
> function. What is everything that happens between the outer join (using a
> JoinStrategy?) and this MapFn call?
>
>
>
> On Mon, Jun 1, 2015 at 4:47 PM, David Ortiz <dortiz@videologygroup.com>
> wrote:
>
>  I can’t say for sure it was an empty one that threw the exception, but
> that is processing the output of an outer join, so they definitely exist.
>
>
>
> *From:* Josh Wills [mailto:jwills@cloudera.com]
> *Sent:* Monday, June 01, 2015 4:42 PM
>
>
> *To:* user@crunch.apache.org
> *Subject:* Re: Iteratable Bug?
>
>
>
> I can't replicate it easily in master-- is the iterable in question empty
> by any chance?
>
>
>
> On Mon, Jun 1, 2015 at 2:34 PM, David Ortiz <dortiz@videologygroup.com>
> wrote:
>
>  This is 0.11.0-cdh5.3.2
>
>
>
> *From:* Josh Wills [mailto:jwills@cloudera.com]
> *Sent:* Monday, June 01, 2015 2:33 PM
>
>
> *To:* user@crunch.apache.org
> *Subject:* Re: Iteratable Bug?
>
>
>
> Yeah, that's odd. This is 0.12? Let me see if I can reproduce it.
>
>
>
> J
>
>
>
> On Mon, Jun 1, 2015 at 2:31 PM, David Ortiz <dortiz@videologygroup.com>
> wrote:
>
>  Hello Josh,
>
>
>
>    Sorry, it is the next() that is throwing the exception.
>
>
>
> Thanks,
>
>      Dave
>
>
>
> *From:* Josh Wills [mailto:jwills@cloudera.com]
> *Sent:* Monday, June 01, 2015 2:30 PM
> *To:* user@crunch.apache.org
> *Subject:* Re: Iteratable Bug?
>
>
>
> Hey David,
>
>
>
> It seems like it. Which line in the function is throwing the exception? Is
> it the hasNext(), or the next()?
>
>
>
> J
>
>
>
> On Mon, Jun 1, 2015 at 2:07 PM, David Ortiz <dortiz@videologygroup.com>
> wrote:
>
>  Hello everyone,
>
>
>
>      I noticed the following does not work in my pipeline:
>
>
>
> @Override
> *public *MergedAttrMapping map(Pair<AttrMapping, Iterable<Integer>>
> attrMappingIterablePair) {
>    MergedAttrMapping out = *mapper*.map(attrMappingIterablePair.first());
>    StringBuilder ids = *new *StringBuilder();
>
>    Iterator<Integer> it = attrMappingIterablePair.second().iterator();
>    while (it.hasNext()) {
>
>       Integer I = it.next();
>
>       *if *(i != *null *&& i != 0) {
>          ids.append(i);
>          ids.append(*'|'*);
>       }
>    }
>
>    *if *(ids.length() > 0) {
>        ids.deleteCharAt(ids.length() -1);
>    }
>
>    out.setIds(ids.toString());
>
>    *return *out;
> }
>
>
>
> Causing the following exception:
>
>
>
> Error: java.util.NoSuchElementException: iterate past last value at
> org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:235)
> at
> org.apache.crunch.types.avro.AvroPairConverter$AvroWrappedIterable$1.next(AvroPairConverter.java:103)
> at
> org.apache.crunch.types.PGroupedTableType$HoldLastIterator.next(PGroupedTableType.java:84)
> at com.videologygroup.crunch.FteWarehouse$1.map(FteWarehouse.java:268) at
> com.videologygroup.crunch.FteWarehouse$1.map(FteWarehouse.java:257) at
> org.apache.crunch.MapFn.process(MapFn.java:34) at
> org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
> at org.apache.crunch.MapFn.process(MapFn.java:34) at
> org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at
> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:113) at
> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:57)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
>
>
> but when I change it to this (change highlighted in yellow):
>
>
>
> @Override
> *public *MergedAttrMapping map(Pair<AttrMapping, Iterable<Integer>> attrMappingIterablePair)
{
>    MergedAttrMapping out = *mapper*.map(attrMappingIterablePair.first());
>    StringBuilder dmpAttrs = *new *StringBuilder();
>
>    *for *(Integer i : attrMappingIterablePair.second()) {
>       *if *(i != *null *&& i != 0) {
>          ids.append(i);
>          ids.append(*'|'*);
>       }
>    }
>
>    *if *(ids.length() > 0) {
>        ids.deleteCharAt(ids.length() -1);
>    }
>
>    out.setIds(ids.toString());
>
>    *return *out;
> }
>
>
>
> It does.
>
>
>
> Is this a bug?
>
>
>
> Thanks,
>
>     Dave Ortiz
>
> *This email is intended only for the use of the individual(s) to whom it
> is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>
>
>
>
>
> --
>
> Director of Data Science
>
> Cloudera <http://www.cloudera.com>
>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
> *This email is intended only for the use of the individual(s) to whom it
> is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>
>
>
>
>
> --
>
> Director of Data Science
>
> Cloudera <http://www.cloudera.com>
>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
> *This email is intended only for the use of the individual(s) to whom it
> is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>
>
>
>
>
> --
>
> Director of Data Science
>
> Cloudera <http://www.cloudera.com>
>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
> *This email is intended only for the use of the individual(s) to whom it
> is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>
>
>
>
>
> --
>
> Director of Data Science
>
> Cloudera <http://www.cloudera.com>
>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>   *This email is intended only for the use of the individual(s) to whom
> it is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message