hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-51) Combiner gives wrong result in the presence of flattening
Date Fri, 14 Dec 2007 20:34:43 GMT

    [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551935
] 

Ted Dunning commented on PIG-51:
--------------------------------


I think I am still seeing this issue or a cousin even after applying this patch.  I don't
know enough to be sure, however.

grunt> ls /logs/search/2007/12/10
/logs/search/2007/12/10/part-00000<r 3>	1313515859
/logs/search/2007/12/10/part-00001<r 3>	1313535390
/logs/search/2007/12/10/part-00002<r 3>	1313485045
/logs/search/2007/12/10/part-00003<r 3>	1313536061
grunt>  a = load '/logs/search/2007/12/10' as (eventType, date, month,
week, day, hour, id, videoId, VisitorUID, engineName, query, offset);
 b = filter a by (id neq '-');

grunt>  b = filter a by (id neq '-');
grunt>  c = group b by id;
grunt>  describe c
c: (group, b: (eventType, date, month, week, day, hour, id, videoId, VisitorUID, engineName,
query, offset ) )
grunt>  d = foreach c {
 click = filter b by eventType eq '/search/click';
 generate COUNT(click);
 }
>> >> >> grunt>  describe d
d: (count1 )
grunt>  e = group d by 1;
grunt>  describe e
e: (group: ( ), d: (count1 ) )
grunt>  f = foreach e generate COUNT(*), SUM(d.count1);
grunt> dump f

----- MapReduce Job -----
Input: [/logs/search/2007/12/10:org.apache.pig.builtin.PigStorage()]
Map: [[*]->[FILTER BY ([PROJECT $6] neq ['-'])]]
Group: [GENERATE {[PROJECT $6],[*]}]
Combine: null
Reduce: GENERATE {[COUNT(GENERATE {[PROJECT $1]->[FILTER BY ([PROJECT $0] eq ['/search/click'])]})]}
Output: /tmp/temp1435257199/tmp1109313480:org.apache.pig.builtin.BinStorage
Split: null
Map parallelism: -1
Reduce parallelism: -1
Job jar size = 482135
2007-12-14 12:04:44,776 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:04:57,832 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:04:59,841 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:05:01,849 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:05:03,857 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:05:05,865 [main] INFO  org.apache.pig - Pig progress = 1%
2007-12-14 12:05:07,873 [main] INFO  org.apache.pig - Pig progress = 1%
2007-12-14 12:05:09,881 [main] INFO  org.apache.pig - Pig progress = 2%
2007-12-14 12:05:11,889 [main] INFO  org.apache.pig - Pig progress = 2%
2007-12-14 12:05:13,897 [main] INFO  org.apache.pig - Pig progress = 2%
2007-12-14 12:05:15,905 [main] INFO  org.apache.pig - Pig progress = 3%
2007-12-14 12:05:17,913 [main] INFO  org.apache.pig - Pig progress = 3%
2007-12-14 12:05:21,929 [main] INFO  org.apache.pig - Pig progress = 3%
2007-12-14 12:05:23,937 [main] INFO  org.apache.pig - Pig progress = 4%
2007-12-14 12:05:25,945 [main] INFO  org.apache.pig - Pig progress = 4%
2007-12-14 12:05:27,953 [main] INFO  org.apache.pig - Pig progress = 4%
2007-12-14 12:05:29,961 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:31,969 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:33,977 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:37,993 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:40,001 [main] INFO  org.apache.pig - Pig progress = 6%
2007-12-14 12:05:42,009 [main] INFO  org.apache.pig - Pig progress = 6%
2007-12-14 12:05:44,016 [main] INFO  org.apache.pig - Pig progress = 6%
2007-12-14 12:05:46,024 [main] INFO  org.apache.pig - Pig progress = 7%
2007-12-14 12:05:48,032 [main] INFO  org.apache.pig - Pig progress = 7%
2007-12-14 12:05:50,040 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:52,051 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:54,060 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:56,068 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:58,077 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:06:00,085 [main] INFO  org.apache.pig - Pig progress = 9%
2007-12-14 12:06:02,092 [main] INFO  org.apache.pig - Pig progress = 9%
2007-12-14 12:06:04,100 [main] INFO  org.apache.pig - Pig progress = 9%
2007-12-14 12:06:08,116 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:10,124 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:12,133 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:18,160 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:20,168 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:22,176 [main] INFO  org.apache.pig - Pig progress = 11%
2007-12-14 12:06:24,184 [main] INFO  org.apache.pig - Pig progress = 11%
2007-12-14 12:06:26,192 [main] INFO  org.apache.pig - Pig progress = 11%
2007-12-14 12:06:28,201 [main] INFO  org.apache.pig - Pig progress = 12%
2007-12-14 12:06:30,208 [main] INFO  org.apache.pig - Pig progress = 12%
2007-12-14 12:06:32,216 [main] INFO  org.apache.pig - Pig progress = 12%
2007-12-14 12:06:34,224 [main] INFO  org.apache.pig - Pig progress = 13%
2007-12-14 12:06:36,232 [main] INFO  org.apache.pig - Pig progress = 13%
2007-12-14 12:06:38,240 [main] INFO  org.apache.pig - Pig progress = 13%
2007-12-14 12:06:40,251 [main] INFO  org.apache.pig - Pig progress = 14%
2007-12-14 12:06:42,260 [main] INFO  org.apache.pig - Pig progress = 14%
2007-12-14 12:06:44,268 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:46,276 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:48,285 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:50,292 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:52,300 [main] INFO  org.apache.pig - Pig progress = 16%
2007-12-14 12:06:56,316 [main] INFO  org.apache.pig - Pig progress = 16%
2007-12-14 12:06:58,324 [main] INFO  org.apache.pig - Pig progress = 17%
2007-12-14 12:07:00,332 [main] INFO  org.apache.pig - Pig progress = 17%
2007-12-14 12:07:02,340 [main] INFO  org.apache.pig - Pig progress = 17%
2007-12-14 12:07:04,348 [main] INFO  org.apache.pig - Pig progress = 17%
...
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000071
java.lang.RuntimeException: java.io.IOException: Column number out of range: 6 -- (      
          )
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:95)
	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
	at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
	at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
	at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:58)
	at org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Caused by: java.io.IOException: Column number out of range: 6 -- (                 )
	at org.apache.pig.data.Tuple.getField(Tuple.java:147)
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:85)
	... 7 more

2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000072
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000073
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000074
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000075
java.lang.RuntimeException: java.io.IOException: Column number out of range: 6 -- (full, 50)
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:95)
	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
	at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
	at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
	at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:58)
	at org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Caused by: java.io.IOException: Column number out of range: 6 -- (full, 50)
	at org.apache.pig.data.Tuple.getField(Tuple.java:147)
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:85)
	... 7 more

2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000076
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000079
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000000
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000001
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000002
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000003
Job failed
grunt> 

> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet.
But if the combiner kicks in, it already flattens the group, leading to column references
being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message