Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C29BF565 for ; Fri, 25 Apr 2014 17:55:57 +0000 (UTC) Received: (qmail 71383 invoked by uid 500); 25 Apr 2014 17:55:55 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 71347 invoked by uid 500); 25 Apr 2014 17:55:55 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@spark.apache.org Delivered-To: mailing list user@spark.apache.org Received: (qmail 71339 invoked by uid 99); 25 Apr 2014 17:55:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Apr 2014 17:55:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [18.7.68.34] (HELO dmz-mailsec-scanner-5.mit.edu) (18.7.68.34) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Apr 2014 17:55:48 +0000 X-AuditID: 12074422-f79186d00000135a-13-535aa18c72db Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id 8E.EE.04954.C81AA535; Fri, 25 Apr 2014 13:55:24 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id s3PHtO3Q000319 for ; Fri, 25 Apr 2014 13:55:24 -0400 Received: from yadid-imac.media.mit.edu (yadid-imac.media.mit.edu [18.85.54.58]) (authenticated bits=0) (User authenticated as yadid@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s3PHtNL9025477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Fri, 25 Apr 2014 13:55:24 -0400 Message-ID: <535AA18B.7020208@media.mit.edu> Date: Fri, 25 Apr 2014 13:55:23 -0400 From: Yadid Ayzenberg User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: user@spark.apache.org Subject: Strange lookup behavior. Possible bug? References: <1398448013435-4837.post@n3.nabble.com> In-Reply-To: <1398448013435-4837.post@n3.nabble.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrPIsWRmVeSWpSXmKPExsUixCmqrNuzMCrY4OARI4sv+wIcGD2+POhg DmCM4rJJSc3JLEst0rdL4Mr4c+wLe8F99op7r5+yNDDOZOti5OSQEDCRmHyqBcoWk7hwbz2Q zcUhJDCbSWLB/bfsEM5ZRokvx2ewQDjPmSTOLb7ACNLCK6AjsXzqESYQm0VAVeJi8ykWEJtN QFti1YH/YGNFBZIlvl19ygZRLyhxcuYToBoODhEBSYkV/6xAwsIC+hK3jzcyg9hCAsYSx3a0 gY3kBLru24KvrCDlzALWEt92F4GEmQXkJba/ncM8gVFgFpKhsxCqZiGpWsDIvIpRNiW3Sjc3 MTOnODVZtzg5MS8vtUjXVC83s0QvNaV0EyMoHNldlHYw/jyodIhRgINRiYd3R05UsBBrYllx Ze4hRkkOJiVRXo7ZQCG+pPyUyozE4oz4otKc1OJDjBIczEoivBw9QDnelMTKqtSifJiUNAeL kjjvW2urYCGB9MSS1OzU1ILUIpisDAeHkgRv9QKgRsGi1PTUirTMnBKENBMHJ8hwHqDhSSA1 vMUFibnFmekQ+VOMilLivGEgCQGQREZpHlwvLF28YhQHekWY1xWkigeYauC6XwENZgIaXDAh HGRwSSJCSqqB0fkDa2RyUn1y3/ErYi4qLvZ3l67KKRYM3ca8RD/xkLBPgfWmmb2aBfej/072 9XiQIOIT6fd6+e6/C24FXneZzPLhWVCJnopRzVMJ7yV+qqdWz3meapr3Ol3pUZoL+yP5mpyt VrkO0sz50z9f1GQ4lzC/tKDqrpv6zuzfLzXW+B5lEygTutGoxFKckWioxVxUnAgA4L1j8/IC AAA= X-Virus-Checked: Checked by ClamAV on apache.org Hi All, Im running a lookup on a JavaPairRDD. When running on local machine - the lookup is successfull. However, when running a standalone cluster with the exact same dataset - one of the tasks never ends (constantly in RUNNING status). When viewing the worker log, it seems that the task has finished successfully: 14/04/25 13:40:38 INFO BlockManager: Found block rdd_2_0 locally 14/04/25 13:40:38 INFO Executor: Serialized size of result for 2 is 10896794 14/04/25 13:40:38 INFO Executor: Sending result for 2 directly to driver 14/04/25 13:40:38 INFO Executor: Finished task ID 2 But it seems the driver is not aware of this, and hangs indefinitely. If I execute a count priot to the lookup - I get the correct number which suggests that the cluster is operating as expected. The exact same scenario works with a different type of key (Tuple2): JavaPairRDD. Any ideas on how to debug this problem ? Thanks, Yadid