Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 29F20184C4 for ; Fri, 9 Oct 2015 19:07:28 +0000 (UTC) Received: (qmail 61964 invoked by uid 500); 9 Oct 2015 19:07:25 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 61872 invoked by uid 500); 9 Oct 2015 19:07:25 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 61862 invoked by uid 99); 9 Oct 2015 19:07:24 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2015 19:07:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7B07F180A75 for ; Fri, 9 Oct 2015 19:07:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.487 X-Spam-Level: *** X-Spam-Status: No, score=3.487 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, NML_ADSP_CUSTOM_MED=1.2, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id LRWwOBBkruyn for ; Fri, 9 Oct 2015 19:07:17 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id BEF55204D9 for ; Fri, 9 Oct 2015 19:07:16 +0000 (UTC) Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id 0383A2AC77C8 for ; Fri, 9 Oct 2015 12:08:03 -0700 (PDT) Date: Fri, 9 Oct 2015 12:07:16 -0700 (MST) From: unk1102 To: user@spark.apache.org Message-ID: <1444417636205-25001.post@n3.nabble.com> Subject: How to tune unavoidable group by query? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi I have the following group by query which I tried to use it both using DataFrame and hiveContext.sql() but both shuffles huge data and is slow. I have around 8 fields passed in as group by fields sourceFrame.select("blabla").groupby("col1","col2","col3",..."col8").agg("bla bla"); OR hiveContext.sql("insert into table partitions bla bla group by "col1","col2","col3",..."col8""); I have tried almost all tuning parameters like tungsten,lz4 shuffle, more shuffle.storage around 6.0 I am using Spark 1.4.0 please guide thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-tune-unavoidable-group-by-query-tp25001.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org