From reviews-return-1211822-archive-asf-public=cust-asf.ponee.io@spark.apache.org Mon Nov 30 17:54:40 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 367A4180637 for ; Mon, 30 Nov 2020 18:54:40 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id A976364F1B for ; Mon, 30 Nov 2020 17:54:39 +0000 (UTC) Received: (qmail 31226 invoked by uid 500); 30 Nov 2020 17:54:39 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 31214 invoked by uid 99); 30 Nov 2020 17:54:39 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2020 17:54:39 +0000 From: =?utf-8?q?GitBox?= To: reviews@spark.apache.org Subject: =?utf-8?q?=5BGitHub=5D_=5Bspark=5D_rdblue_commented_on_pull_request_=2329066?= =?utf-8?q?=3A_=5BSPARK-23889=5D=5BSQL=5D_DataSourceV2=3A_required_sorting_a?= =?utf-8?q?nd_clustering_for_writes?= Message-ID: <160675887895.15632.8538550292180256181.asfpy@gitbox.apache.org> Date: Mon, 30 Nov 2020 17:54:38 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit References: In-Reply-To: rdblue commented on pull request #29066: URL: https://github.com/apache/spark/pull/29066#issuecomment-735942977 > I am interested in what other devs think and whether we are OK breaking the existing API. Since the other API is targeted at the read path, I would have no problem adding this one in parallel under a `write` package. I think that we should deprecate the read-side distribution because it doesn't really help with bucketed joins. I'm also fine changing the existing API, but I'd rather just deprecate it and remove it when we have a replacement for bucketed joins and other read-side optimizations. > Probably worth to raise a discussion in dev@ mailing list? Yes. But if we want to get this into 3.1.0, we should start moving on everything in parallel. We should start getting the addition of `Write` done because it needs to carry the `RequiresDistributionAndSort` interface no matter what we decide about `Distribution`. And we can at least get a WIP PR up to add the new distribution interfaces. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org