drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Chen <tnac...@gmail.com>
Subject Re: Schema discovery
Date Sat, 02 Nov 2013 05:51:54 GMT
Hi Julian,

Glad to have someone responded to this :) Yes I think going beyond just
having no schema defined up front to actually giving users possibilities is
definitely a much better interactive experience.

I would imagine though that it could impact Drill, or perhaps build more
statistics capabilities in Drill to query schema info, since not all data
is just raw files but could be living in different data stores, then I
would think we need to talk through the Drill storage engine abstraction to
get those info.

I'll chat about this with Jacques and folks next monday or in the Drill
user group.


On Fri, Nov 1, 2013 at 4:51 PM, Julian Hyde <julianhyde@gmail.com> wrote:

> A recent blog post by Daniel Abadi has a similar theme:
> http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/
> We could create a tool that scans the raw files and generates an Optiq
> schema that contains views that apply "late schema" (the "EMP" and "DEPT"
> views in
> https://raw.github.com/apache/incubator-drill/HEAD/sqlparser/src/test/resources/test-models.jsonare
examples of this). The user could interactively modify that schema
> (e.g. change a column's type from string to boolean or integer).
> It's a nice approach because it doesn't impact the Drill engine. This is
> good. Metadata and data should be kept separate wherever possible.
> Julian

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message