pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Fwd: e2e tests for Rank function
Date Wed, 26 Sep 2012 10:07:07 GMT
Forwarding to pig-dev.

Summary, it looks like we have a regression in trunk.
We need to investigate it before branching 0.11

Cheers,
--
Gianmarco



---------- Forwarded message ----------
From: Allan <aavendan@gmail.com>
Date: Wed, Sep 26, 2012 at 11:21 AM
Subject: Re: e2e tests for Rank function
To: cheolsoo <cheolsoo@cloudera.com>, Gianmarco De Francisci Morales <
gdfm@apache.org>


Hi Cheolsoo and Gianmarco,

I double check the e2e tests, and I reproduced the scenario and it's
correct...it's failing.

Then, looking for a possible reason, I tried the following script:

SET default_parallel 9;
A = LOAD 'prerank' using PigStorage(',') as
(rownumber:long,rankcabd:long,rankbdaa:long,rankbdca:long,rankaacd:long,rankaaba:long,a:int,b:int,c:int,tail:bytearray);
B = group A by (a, b);
C = foreach B generate flatten(group),A;
D = order C by group::a ASC, group::b ASC;


And it fails, with the same exception' message.

Then, I tried the same script, but omitting the "SET default_parallel 9;"
and it works. So, I'm really surprised that on local mode it doesn't work
with parallelism.

The reason for using this script is because RANK (RANK BY) operator uses
the same chain of operators: GROUP (B), a flatten (C), SORT (D).

Best regards,

On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park <cheolsoo@cloudera.com>wrote:

> Hello,
>
> The e2e tests for Rank function in trunk do not pass for me when running in
> local mode. I am wondering whether they all pass for everyone.
>
> What I am doing is as following:
>
> ant clean
>  ant -Dhadoopversion=20 ... test-e2e-deploy-local
> ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run="-t Rank"
>
> All tests except Rank_4 fail with errors similar to this:
>
> java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
>     at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
>     at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>     at
>
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> I wanted to double check whether I am doing something wrong before I open a
> jira.
>
> Thanks,
> Cheolsoo
>



-- 

Allan AvendaƱo S.
Computer Engineer
SWY22 Participant
GSOC 2012 Participant
Rome - Italy
Gmail: aavendan@gmail.com
--

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message