Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0C7A2200BA8 for ; Mon, 24 Oct 2016 17:07:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0B29E160AE1; Mon, 24 Oct 2016 15:07:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 87FA7160C28 for ; Mon, 24 Oct 2016 16:59:00 +0200 (CEST) Received: (qmail 3120 invoked by uid 500); 24 Oct 2016 14:58:59 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 3110 invoked by uid 99); 24 Oct 2016 14:58:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Oct 2016 14:58:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 57A722C14F9 for ; Mon, 24 Oct 2016 14:58:59 +0000 (UTC) Date: Mon, 24 Oct 2016 14:58:59 +0000 (UTC) From: "Apache Spark (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (SPARK-18076) Fix default Locale used in DateFormat, NumberFormat to Locale.US MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 24 Oct 2016 15:07:15 -0000 [ https://issues.apache.org/jira/browse/SPARK-18076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18076: ------------------------------------ Assignee: Apache Spark > Fix default Locale used in DateFormat, NumberFormat to Locale.US > ---------------------------------------------------------------- > > Key: SPARK-18076 > URL: https://issues.apache.org/jira/browse/SPARK-18076 > Project: Spark > Issue Type: Bug > Components: MLlib, Spark Core, SQL > Affects Versions: 2.0.1 > Reporter: Sean Owen > Assignee: Apache Spark > > Many parts of the code use {{DateFormat}} and {{NumberFormat}} instances. Although the behavior of these format is mostly determined by things like format strings, the exact behavior can vary according to the platform's default locale. Although the locale defaults to "en", it can be set to something else by env variables. And if it does, it can cause the same code to succeed or fail based just on locale: > {code} > import java.text._ > import java.util._ > def parse(s: String, l: Locale) = new SimpleDateFormat("yyyyMMMdd", l).parse(s) > parse("1989Dec31", Locale.US) > Sun Dec 31 00:00:00 GMT 1989 > parse("1989Dec31", Locale.UK) > Sun Dec 31 00:00:00 GMT 1989 > parse("1989Dec31", Locale.CHINA) > java.text.ParseException: Unparseable date: "1989Dec31" > at java.text.DateFormat.parse(DateFormat.java:366) > at .parse(:18) > ... 32 elided > parse("1989Dec31", Locale.GERMANY) > java.text.ParseException: Unparseable date: "1989Dec31" > at java.text.DateFormat.parse(DateFormat.java:366) > at .parse(:18) > ... 32 elided > {code} > Where not otherwise specified, I believe all instances in the code should default to some fixed value, and that should probably be {{Locale.US}}. This matches the JVM's default, and specifies both language ("en") and region ("US") to remove ambiguity. This most closely matches what the current code behavior would be (unless default locale was changed), because it will currently default to "en". > This affects SQL date/time functions. At the moment, the only SQL function that lets the user specify language/country is "sentences", which is consistent with Hive. > It affects dates passed in the JSON API. > It affects some strings rendered in the UI, potentially. Although this isn't a correctness issue, there may be an argument for not letting that vary (?) > It affects a bunch of instances where dates are formatted into strings for things like IDs or file names, which is far less likely to cause a problem, but worth making consistent. > The other occurrences are in tests. > The downside to this change is also its upside: the behavior doesn't depend on default JVM locale, but, also can't be affected by the default JVM locale. For example, if you wanted to parse some dates in a way that depended on an non-US locale (not just the format string) then it would no longer be possible. There's no means of specifying this, for example, in SQL functions for parsing dates. However, controlling this by globally changing the locale isn't exactly great either. > The purpose of this change is to make the current default behavior deterministic and fixed. PR coming. > CC [~hyukjin.kwon] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org