spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yadid Ayzenberg <ya...@media.mit.edu>
Subject Strange lookup behavior. Possible bug?
Date Fri, 25 Apr 2014 17:55:23 GMT
Hi All,

Im running a lookup on a JavaPairRDD<String, Tuple2>.
When running on local machine - the lookup is successfull. However, when 
running a standalone cluster with the exact same dataset - one of the 
tasks never ends (constantly in RUNNING status).
When viewing the worker log, it seems that the task has finished 
successfully:

14/04/25 13:40:38 INFO BlockManager: Found block rdd_2_0 locally
14/04/25 13:40:38 INFO Executor: Serialized size of result for 2 is 10896794
14/04/25 13:40:38 INFO Executor: Sending result for 2 directly to driver
14/04/25 13:40:38 INFO Executor: Finished task ID 2

But it seems the driver is not aware of this, and hangs indefinitely.

If I execute a count priot to the lookup - I get the correct number 
which suggests that the cluster is operating as expected.

The exact same scenario works with a different type of key (Tuple2): 
JavaPairRDD<Tuple2, Tuple2>.

Any ideas on how to debug this problem ?

Thanks,

Yadid


Mime
View raw message