From issues-return-187376-archive-asf-public=cust-asf.ponee.io@spark.apache.org Mon Mar 19 09:43:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 47FBD18076D for ; Mon, 19 Mar 2018 09:43:05 +0100 (CET) Received: (qmail 45628 invoked by uid 500); 19 Mar 2018 08:43:04 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 45616 invoked by uid 99); 19 Mar 2018 08:43:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2018 08:43:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D9E85C6B44 for ; Mon, 19 Mar 2018 08:43:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yVE1fqtYWMYE for ; Mon, 19 Mar 2018 08:43:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 8DFC05FAC6 for ; Mon, 19 Mar 2018 08:43:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 55AB0E0D77 for ; Mon, 19 Mar 2018 08:43:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8FA73214B5 for ; Mon, 19 Mar 2018 08:43:00 +0000 (UTC) Date: Mon, 19 Mar 2018 08:43:00 +0000 (UTC) From: "Herman van Hovell (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-23599) The UUID() expression is too non-deterministic MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404511#comment-16404511 ] Herman van Hovell commented on SPARK-23599: ------------------------------------------- PR 1 out of 2 has been merged. > The UUID() expression is too non-deterministic > ---------------------------------------------- > > Key: SPARK-23599 > URL: https://issues.apache.org/jira/browse/SPARK-23599 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Herman van Hovell > Assignee: Liang-Chi Hsieh > Priority: Critical > > The current {{Uuid()}} expression uses {{java.util.UUID.randomUUID}} for UUID generation. There are a couple of major problems with this: > - It is non-deterministic across task retries. This breaks Spark's processing model, and this will to very hard to trace bugs, like non-deterministic shuffles, duplicates and missing rows. > - It uses a single secure random for UUID generation. This uses a single JVM wide lock, and this can lead to lock contention and other performance problems. > We should move to something that is deterministic between retries. This can be done by using seeded PRNGs for which we set the seed during planning. It is important here to use a PRNG that provides enough entropy for creating a proper UUID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org