Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DEAEC69D for ; Thu, 25 Dec 2014 06:43:32 +0000 (UTC) Received: (qmail 48205 invoked by uid 500); 25 Dec 2014 06:43:31 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 48132 invoked by uid 500); 25 Dec 2014 06:43:31 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 48116 invoked by uid 99); 25 Dec 2014 06:43:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Dec 2014 06:43:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pwendell@gmail.com designates 209.85.218.48 as permitted sender) Received: from [209.85.218.48] (HELO mail-oi0-f48.google.com) (209.85.218.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Dec 2014 06:43:02 +0000 Received: by mail-oi0-f48.google.com with SMTP id u20so19535614oif.7 for ; Wed, 24 Dec 2014 22:42:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=HsBazmzetPswZzDhOmj/UL+9ljhiab1BC0KRavBpMUo=; b=Stf6l+Gfj2L2P2qSg6sh+L8z7Mru35KC+BJ2rUp24T43pjegkcAt7mbuItRINX0Zf0 UR68TcfgtwLbdseeJ6x4JXGVRxPgE0SgEQLuRpvKfRceNVii2/AcKwaZElk0XvQIclDk Z2IqHFTxhgnCugWbXU47g9+fZkB9PIWOf6+Itvbf9s6b9hHVpXKwl4ggRsCCeMwztr1Y X1xRIgrsjTQ+0vL+Y9jJ5NB9b9ghKU8kF+sfd+RXS8807T0MthpSBaSag+6hsN1xlbyJ RL7q5yTz+GBycBxFKHTowATrza6QpaDW2RLXxiEY169BTHP0gBbjOaJyNslXHp+w2YOB mhvg== MIME-Version: 1.0 X-Received: by 10.60.103.138 with SMTP id fw10mr21876621oeb.18.1419489736125; Wed, 24 Dec 2014 22:42:16 -0800 (PST) Received: by 10.202.198.65 with HTTP; Wed, 24 Dec 2014 22:42:16 -0800 (PST) In-Reply-To: References: Date: Wed, 24 Dec 2014 22:42:16 -0800 Message-ID: Subject: Re: Problems with large dataset using collect() and broadcast() From: Patrick Wendell To: Will Yang Cc: "dev@spark.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory? - Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang wrote: > Hi all, > In my occasion, I have a huge HashMap[(Int, Long), (Double, Double, > Double)], say several GB to tens of GB, after each iteration, I need to > collect() this HashMap and perform some calculation, and then broadcast() > it to every node. Now I have 20GB for each executor and after it > performances collect(), it gets stuck at "Added rdd_xx_xx", no further > respond showed on the Application UI. > > I've tried to lower the spark.shuffle.memoryFraction and > spark.storage.memoryFraction, but it seems that it can only deal with as > much as 2GB HashMap. What should I optimize for such conditions. > > (ps: sorry for my bad English & Grammar) > > > Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org