Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8497219600 for ; Tue, 5 Apr 2016 16:47:07 +0000 (UTC) Received: (qmail 31044 invoked by uid 500); 5 Apr 2016 16:47:03 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 30925 invoked by uid 500); 5 Apr 2016 16:47:03 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 30915 invoked by uid 99); 5 Apr 2016 16:47:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2016 16:47:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E347E18021E for ; Tue, 5 Apr 2016 16:47:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.486 X-Spam-Level: *** X-Spam-Status: No, score=3.486 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Nn4n7PN0YNPb for ; Tue, 5 Apr 2016 16:47:01 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id CC8C45FB2D for ; Tue, 5 Apr 2016 16:47:00 +0000 (UTC) Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id 623402167A407 for ; Tue, 5 Apr 2016 09:33:46 -0700 (PDT) Date: Tue, 5 Apr 2016 09:46:59 -0700 (MST) From: dsing001 To: user@spark.apache.org Message-ID: <1459874819476-26681.post@n3.nabble.com> Subject: Plan issue with spark 1.5.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I am using spark 1.5.2. I have a question regarding plan generated by spark. I have 3 data-frames which has the data for different countries. I have around 150 countries and data is skewed. My 95% queries will have country as criteria. However, I have seen issues with the plans generated for queries which has country as join column. Data-frames are partitioned based on the country.Not only these dataframes are co-partitioned, these are co-located as well. E.g. Data for UK in data-frame df1, df2 df3 will be at on same hdfs datanode. Then when i join these 3 tables and country is one of the join column. I assume that the join should be the map side join but it shuffles the data from 3 dataframes and then join using shuffled data. Apart from country there are other columns in join. Is this correct behavior? If it is an issue is it fixed in latest versions? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Plan-issue-with-spark-1-5-2-tp26681.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org