Return-Path: X-Original-To: apmail-incubator-drill-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 964CB10161 for ; Sat, 2 Nov 2013 05:52:31 +0000 (UTC) Received: (qmail 93389 invoked by uid 500); 2 Nov 2013 05:52:27 -0000 Delivered-To: apmail-incubator-drill-user-archive@incubator.apache.org Received: (qmail 93312 invoked by uid 500); 2 Nov 2013 05:52:21 -0000 Mailing-List: contact drill-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-user@incubator.apache.org Delivered-To: mailing list drill-user@incubator.apache.org Received: (qmail 93291 invoked by uid 99); 2 Nov 2013 05:52:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Nov 2013 05:52:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tnachen@gmail.com designates 74.125.82.45 as permitted sender) Received: from [74.125.82.45] (HELO mail-wg0-f45.google.com) (74.125.82.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Nov 2013 05:52:14 +0000 Received: by mail-wg0-f45.google.com with SMTP id z12so313935wgg.12 for ; Fri, 01 Nov 2013 22:51:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=z6JXBtFuSTNxIC0lYQzCCPe0G2SaEEiK0wHbcfR0lzE=; b=zBG16uXOjlSC7m3EtI0FZhVxbg1Jq9TdmcdkWdD0S1v/ffSR+EwUxNStBAajU/mWyl 4+dxkK5GCSntpFsli6snjhCn8qz4Jgc6XZ6KYDAOUgZbhJmw6jJ+rQtxacll/Be7FW9M K4GfD1QmpfAjeMlb5l3CcQUIRnPtcRK6c2kym2LAZIy77N2S4jHjUOJyXiW8wwChQH22 E3WeER+Cs9lRJrzE3h19DQPjavRElz3KybTZTmzAgxel8zdc0ULb26m/aMTLhBFYwRNf WYUPW0+CPS6AOtCIPcSMzVwpTbtRmn+KAla0jsDBn+D8kSOY5Xq7yLzBuKcDtKcdjMi4 tiIQ== MIME-Version: 1.0 X-Received: by 10.180.198.79 with SMTP id ja15mr4646985wic.36.1383371514254; Fri, 01 Nov 2013 22:51:54 -0700 (PDT) Received: by 10.227.35.69 with HTTP; Fri, 1 Nov 2013 22:51:54 -0700 (PDT) In-Reply-To: References: Date: Fri, 1 Nov 2013 22:51:54 -0700 Message-ID: Subject: Re: Schema discovery From: Timothy Chen To: drill Cc: Apache Drill User Content-Type: multipart/alternative; boundary=047d7b624252b3493004ea2b47c0 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b624252b3493004ea2b47c0 Content-Type: text/plain; charset=ISO-8859-1 Hi Julian, Glad to have someone responded to this :) Yes I think going beyond just having no schema defined up front to actually giving users possibilities is definitely a much better interactive experience. I would imagine though that it could impact Drill, or perhaps build more statistics capabilities in Drill to query schema info, since not all data is just raw files but could be living in different data stores, then I would think we need to talk through the Drill storage engine abstraction to get those info. I'll chat about this with Jacques and folks next monday or in the Drill user group. Tim On Fri, Nov 1, 2013 at 4:51 PM, Julian Hyde wrote: > A recent blog post by Daniel Abadi has a similar theme: > > > http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/ > > We could create a tool that scans the raw files and generates an Optiq > schema that contains views that apply "late schema" (the "EMP" and "DEPT" > views in > https://raw.github.com/apache/incubator-drill/HEAD/sqlparser/src/test/resources/test-models.jsonare examples of this). The user could interactively modify that schema > (e.g. change a column's type from string to boolean or integer). > > It's a nice approach because it doesn't impact the Drill engine. This is > good. Metadata and data should be kept separate wherever possible. > > Julian --047d7b624252b3493004ea2b47c0--