pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4438) Can not work when in "limit after sort" situation in spark mode
Date Tue, 17 Mar 2015 01:44:38 GMT

     [ https://issues.apache.org/jira/browse/PIG-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liyunzhang_intel updated PIG-4438:
----------------------------------
    Attachment: PIG-4438_1.patch

PIG-4438_1.patch is the initial patch. Meet some problems when running the script in the bug
description. Need more time to figure out. Error info is:
{code}
269 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
java.lang.ClassCastException: java.lang.Byte cannot     be cast to java.util.Iterator
270         at org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:85)
271         at org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48)
272         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
273         at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30)
274         at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35)
275         at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
276         at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
277         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
278         at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
279         at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30)
280         at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
281         at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
282         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
283         at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
284         at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30)
285         at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
286         at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
287         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
288         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:987)
289         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
290         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
291         at org.apache.spark.scheduler.Task.run(Task.scala:56)
292         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
293         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
294         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
295         at java.lang.Thread.run(Thread.java:744)
{code}

> Can not work when in "limit after sort" situation in spark mode
> ---------------------------------------------------------------
>
>                 Key: PIG-4438
>                 URL: https://issues.apache.org/jira/browse/PIG-4438
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4438_1.patch
>
>
> when pig script executes "order" before "limit" in spark mode, the results will be wrong.
> cat testlimit.txt
> 1	orange
> 3	coconut
> 5	grape
> 6	pear
> 2	apple
> 4	mango
> testlimit.pig:
> a = load './testlimit.txt' as (x:int, y:chararray);
> b = order a by x;
> c = limit b 1;
> store c into './testlimit.out';
> the result:
> 1	orange
> 2	apple
> 3	coconut
> 4	mango
> 5	grape
> 6	pear
> the correct result should be:
> 1	orange



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message