spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kronenfeld <nkronenf...@oculusinfo.com>
Subject Re: Problem with tests
Date Fri, 22 Nov 2013 21:02:31 GMT
Actually, looking into recent commits, it looks like my hunch may be
exactly correct:
https://github.com/apache/incubator-spark/commit/f639b65eabcc8666b74af8f13a37c5fdf7e0185f
"PartitionPruningRDD is using index from parent"

Is there anyone who can explain why this new behavior is preferable?  And,
if it's staying, can suggest a way to fix my tests for this case?

Thanks again,
                 Nathan


On Fri, Nov 22, 2013 at 3:56 PM, Nathan Kronenfeld <
nkronenfeld@oculusinfo.com> wrote:

> Hi there.
>
> I have a problem with the unit tests on a pull request I'm trying to tie
> up.  The changes deal with partition-related functions.
>
> In particular, the tests I have that test an append-to-partition function
> work fine on my own machine, but fail on the build machine (
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2152/console
> ).
>
> The failure seems to stem from pulling a single partition out of the set.
> In either case, when I work on the full dataset:
>
> UnionRDD[11] at apply at FunSuite.scala:1265 (4 partitions)
>   UnionRDD[9] at apply at FunSuite.scala:1265 (3 partitions)
>     ParallelCollectionRDD[8] at apply at FunSuite.scala:1265 (1 partitions)
>     MapPartitionsWithContextRDD[7] at apply at FunSuite.scala:1265 (2 partitions)
>       ParallelCollectionRDD[4] at apply at FunSuite.scala:1265 (2 partitions)
>   ParallelCollectionRDD[10] at apply at FunSuite.scala:1265 (1 partitions)
>
>
> It seems to work.  When I pull one partition out of this, by wrapping a PartitionPruningRDD
around it (pruning out everything but partition 2):
>
> PartitionPruningRDD[12] at apply at FunSuite.scala:1265 (1 partitions)
>   UnionRDD[11] at apply at FunSuite.scala:1265 (4 partitions)
>     UnionRDD[9] at apply at FunSuite.scala:1265 (3 partitions)
>       ParallelCollectionRDD[8] at apply at FunSuite.scala:1265 (1 partitions)
>       MapPartitionsWithContextRDD[7] at apply at FunSuite.scala:1265 (2 partitions)
>         ParallelCollectionRDD[4] at apply at FunSuite.scala:1265 (2 partitions)
>     ParallelCollectionRDD[10] at apply at FunSuite.scala:1265 (1 partitions)
>
>
> In this case, my local machine and the build machine seem to act
> differently.
>
> On my local machine, what is in the inner ParallelCollection partition #2
> shows up in the MapPartitionsWithContextRDD as partition #2 still.  On the
> build machine, this same partition shows up in the later RDD as partition
> #0 - presumably because everything else is pruned out, but that pruning
> should happen at an outer level, shouldn't it?
>
> Does anyone know why the build machine would act different from locally
> here?
>
> Also, sadly, this worked fine two days ago.
>
> My only thought is that perhaps the PullRequestBuilder does a merge with
> current code, and someone broke this in the last day or two?  Past that,
> I'm at a bit of a loss.
>
> Thanks,
>                     -Nathan
>
>
> --
>
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenfeld@oculusinfo.com
>



-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenfeld@oculusinfo.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message