incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <julianh...@gmail.com>
Subject Re: Schema discovery
Date Fri, 01 Nov 2013 23:51:10 GMT
A recent blog post by Daniel Abadi has a similar theme:

http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/

We could create a tool that scans the raw files and generates an Optiq schema that contains
views that apply "late schema" (the "EMP" and "DEPT" views in https://raw.github.com/apache/incubator-drill/HEAD/sqlparser/src/test/resources/test-models.json
are examples of this). The user could interactively modify that schema (e.g. change a column's
type from string to boolean or integer).

It's a nice approach because it doesn't impact the Drill engine. This is good. Metadata and
data should be kept separate wherever possible.

Julian
Mime
View raw message