pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gianmarco De Francisci Morales (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2932) Setting high default_parallel causes IOException in local mode
Date Wed, 26 Sep 2012 13:43:07 GMT
Gianmarco De Francisci Morales created PIG-2932:

             Summary: Setting high default_parallel causes IOException in local mode
                 Key: PIG-2932
                 URL: https://issues.apache.org/jira/browse/PIG-2932
             Project: Pig
          Issue Type: Bug
            Reporter: Gianmarco De Francisci Morales
            Priority: Critical

This bug has been confirmed only in local mode.

When setting a high default_parallel, Pig fails on some operations.
The following data and script reproduce the bug.

grunt> cat file.txt                                  
11	1	qwer
12	2	qwerty
13	3	ert
13	3	ertyu
14	4	zxcv
16	6	fsdfg
16	6	fdfghj
18	8	fjklopi

SET default_parallel 9
a = load 'file.txt' as (id1:int, id2:int, str:chararray);
b = group a by (id1,id2);
c = foreach b generate flatten(group), a;
d = order c by group::id1 ASC, group::id2 ASC;
dump d

2012-09-26 15:28:13,230 [Thread-32] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map
- Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C:  R: 
2012-09-26 15:28:13,232 [Thread-32] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

The script succeeds if default_parallel is set to 2.
I guess it depends on the fact that the default_parallel is higher than the number of unique
keys, probably some quirk with ORDER BY.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message