Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 02BE0200D5C for ; Fri, 15 Dec 2017 22:47:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 01057160C14; Fri, 15 Dec 2017 21:47:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2127B160C04 for ; Fri, 15 Dec 2017 22:47:04 +0100 (CET) Received: (qmail 57734 invoked by uid 500); 15 Dec 2017 21:47:04 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 57725 invoked by uid 99); 15 Dec 2017 21:47:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Dec 2017 21:47:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B15DC1A039C for ; Fri, 15 Dec 2017 21:47:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id yjbsIsdjPDDK for ; Fri, 15 Dec 2017 21:47:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 579635FBED for ; Fri, 15 Dec 2017 21:47:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7D742E045B for ; Fri, 15 Dec 2017 21:47:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3CA5D27406 for ; Fri, 15 Dec 2017 21:47:00 +0000 (UTC) Date: Fri, 15 Dec 2017 21:47:00 +0000 (UTC) From: "Paul Rogers (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DRILL-6035) Specify Drill's JSON behavior MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 15 Dec 2017 21:47:06 -0000 [ https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293265#comment-16293265 ] Paul Rogers edited comment on DRILL-6035 at 12/15/17 9:46 PM: -------------------------------------------------------------- Drill uses simple rules to infer scalar data types from JSON input: * If the value is quoted, it is a Nullable {{VARCHAR}}. Example: {{a: "varchar"}} * If the value is numeric, and has no decimal point, it is a Nullable {{BIGINT}}. Example: {{a: 10}} * If the value is numeric, and has a decimal point, it is a Nullable {{FLOAT8}}. Eample: {{a: 10.5}} * If the value is {{true}} or {{false}}, it is a Nullable {{BIT}}, stored as Nullable {{UINT1}}, with {{true}} stored as 1, {{false}} as 0. Example: {{a: true}} A special case occurs if the user enables {{store.json.all_text_mode}}. In this case, all cases above give rise to a nullable {{VARCHAR}} column type. h4. Nulls with Scalar Values The rules for {{null}} are a bit more complex. The rules below apply to the Drill 1.13 JSON reader, earlier versions require investigation. * If the {{null}} occurs after seeing one of the above types, then the value is a {{NULL}} of that type (such as a {{NULL}} {{VARCHAR}}.) * If the {{null}} occurs at the start of the file, Drill defers identifying the data type until it sees the first non-null value. * If the entire file (or first batch of a file: 4K records before Drill 1.13, variable number in Drill 1.13 or later) contains only {{null}} values, the column is defined as nullable {{VARCHAR}}. (Drill 1.12 and earlier likely has different behavior.) For example, in the following example, Drill 1.13 identifies the type as nullable {{BIGINT}}: {code} {a: null} {a: null} {a: null} {a: 10} {code} In the following example, Drill must guess that the data type is nullable {{VARCHAR}}: {code} {a: null} {a: null} {a: null} {code} was (Author: paul.rogers): Drill uses simple rules to infer scalar data types from JSON input: * If the value is quoted, it is a Nullable {{VARCHAR}}. Example: {{a: "varchar"}} * If the value is numeric, and has no decimal point, it is a Nullable {{BIGINT}}. Example: {{a: 10}} * If the value is numeric, and has a decimal point, it is a Nullable {{FLOAT8}}. Eample: {{a: 10.5}} * If the value is {{true}} or {{false}}, it is a Nullable {{BIT}}, stored as Nullable {{UINT1}}, with {{true}} stored as 1, {{false}} as 0. Example: {{a: true}} A special case occurs if the user enables {{store.json.all_text_mode}}. In this case, all cases above give rise to a nullable {{VARCHAR}} column type. The rules for {{null}} are a bit more complex. The rules below apply to the Drill 1.13 JSON reader, earlier versions require investigation. * If the {{null}} occurs after seeing one of the above types, then the value is a {{NULL}} of that type (such as a {{NULL}} {{VARCHAR}}.) * If the {{null}} occurs at the start of the file, Drill defers identifying the data type until it sees the first non-null value. * If the entire file (or first batch of a file: 4K records before Drill 1.13, variable number in Drill 1.13 or later) contains only {{null}} values, the column is defined as nullable {{VARCHAR}}. (Drill 1.12 and earlier likely has different behavior.) For example, in the following example, Drill 1.13 identifies the type as nullable {{BIGINT}}: {code} {a: null} {a: null} {a: null} {a: 10} {code} In the following example, Drill must guess that the data type is nullable {{VARCHAR}}: {code} {a: null} {a: null} {a: null} {code} > Specify Drill's JSON behavior > ----------------------------- > > Key: DRILL-6035 > URL: https://issues.apache.org/jira/browse/DRILL-6035 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.13.0 > Reporter: Paul Rogers > Assignee: Pritesh Maker > > Drill supports JSON as its native data format. However, experience suggests that Drill may have limitations in the JSON that Drill supports. This ticket asks to clarify Drill's expected behavior on various kinds of JSON. > Topics to be addressed: > * Relational vs. non-relational structures > * JSON structures used in practice and how they map to Drill > * Support for varying data types > * Support for missing values, especially across files > These topics are complex, hence the request to provide a detailed specifications that clarifies what Drill does and does not support (or what is should and should not support.) -- This message was sent by Atlassian JIRA (v6.4.14#64029)