Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6324117353 for ; Mon, 26 Jan 2015 23:13:53 +0000 (UTC) Received: (qmail 1641 invoked by uid 500); 26 Jan 2015 23:13:53 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 1580 invoked by uid 500); 26 Jan 2015 23:13:53 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 1569 invoked by uid 99); 26 Jan 2015 23:13:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 23:13:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cwestin@maprtech.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-la0-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 23:13:27 +0000 Received: by mail-la0-f41.google.com with SMTP id gm9so10479391lab.0 for ; Mon, 26 Jan 2015 15:11:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=QKwfrcr9FT4cAO6qtX3udALVUjHPY7+v8f/jy7A69LE=; b=UG5ZeYjwoFnPwOo2AZVVL6/eUL6oit3y2lO4UAZnIs8g4WV8IXwAYsBA05LhnOpWmm f+NYOvI265zhrURz2mgn+9RuxhyPI+W3uK5IMr/HLU4cYgjBRgMMgnCNUzrvgg6OPVft knVwEjRM+QDYcLydo0vtz/KVaQHpvS8IzxQ6wi+taf9LIHdk8+uMCtIFBVtFJOxGPubx dhridwMCzVNvpxs3N5/M2KkqGEQWkWIPMGOUxFq0i+g6J0IYo+aIezoWEFlmIQQUHupH Tz+4qKuASTBlBXqCf11t9x++AkMDRu2qCzlYX0x8dUAsPTM+GrcqzWoA3M5XUTw8e4Vj d4JA== X-Gm-Message-State: ALoCoQkchmQsj6ftxDPTIYdkZdMk0gaBZ+E3ldVRJRrVWpTN681Vi/v027jOkZu8t3BC/ya9Qk1E MIME-Version: 1.0 X-Received: by 10.112.129.195 with SMTP id ny3mr851097lbb.10.1422313916369; Mon, 26 Jan 2015 15:11:56 -0800 (PST) Received: by 10.25.22.224 with HTTP; Mon, 26 Jan 2015 15:11:56 -0800 (PST) In-Reply-To: References: Date: Mon, 26 Jan 2015 15:11:56 -0800 Message-ID: Subject: Re: [DISCUSS] Change default json read behavior for numbers From: Chris Westin To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=047d7b3441c6bea573050d964325 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3441c6bea573050d964325 Content-Type: text/plain; charset=UTF-8 JavaScript (and therefore JSON) defines all numbers to be 64 bit floating point, even if they're written without decimals. So, if someone is writing JSON, this would be their expectation. I would read them all as doubles. => http://www.w3schools.com/js/js_numbers.asp On Mon, Jan 26, 2015 at 2:17 PM, Jacques Nadeau wrote: > Writing zero int to a float column should be allowed. Basically, if we > found a float previously and then we run across a zero, that should be > accepted. This doesn't fix the situation where the first value was zero > but definitely fixes many situations. I'm up for a second option to treat > all numbers as doubles but I'm not in support of it for the default as once > we finish embedded types, this would be our desired behavior. > > On Mon, Jan 26, 2015 at 1:36 PM, Jason Altekruse > > wrote: > > > Hello Drillers, > > > > I am currently working on improving the error reporting in the JSON > reader > > to help users with files that Drill cannot read using the default > > configuration today. > > > > As a part of this change I think it may be useful to change the default > > behavior for reading numbers in JSON documents. Currently we fail on a > > simple case with reading numbers with decimal points and then hit a value > > of 0 (or any number without a decimal point) in a later record. The > reason > > for the current behavior is to allow better precision in the case of > files > > with only integers. The issue however is that we currently fail on the > > basic case with a mix of intergers and decimal numbers. See [1] for more > > discussion on this. > > > > I propose that we switch the JSON reader to read all numbers as doubles > by > > default. The reader already contains a workaround that allows lossless > > casting to integers and decimal types with some extra computational > > overhead using all_text_mode, see more info below. [2] > > > > Please share your thoughts on this change. > > > > [1] https://issues.apache.org/jira/browse/DRILL-1460 > > [2] https://issues.apache.org/jira/browse/DRILL-2071 > > > > -Jason > > > --047d7b3441c6bea573050d964325--