drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jul...@hydromatic.net>
Subject Re: Dynamic columns a la Phoenix
Date Thu, 04 Dec 2014 19:27:29 GMT
On Dec 4, 2014, at 6:09 AM, Jacques Nadeau <jacques@apache.org> wrote:

> Very cool! Interesting approach.   It is kind of like create external table
> as a sub clause of the query.
> My question would be why it isn't just done in the select list.   I'm
> generally against adding constructs and new clauses in the parser,
> especially with repetitive information.

These “undeclared” columns can’t be added to the select list because you can’t reference
something that doesn’t exist. (Well, you can, and Drill does, but the approach of silently
creating something has issues. Weak typing zealots would shrug off those issues, and strong
typing zealots would be irritated by them.)

On balance, it doesn’t seem too repetitive: you’re just declaring the new columns; and
you’re declaring them just once; and it is an opportunity to provide a type, which quite
often reduces the verbosity of the code elsewhere.

This is not clear-cut decision for me. As Ted says, it doesn’t solve all problems, and I
see the benefits of other approaches too. But this is a nice application of the principle
of schema-on-read — arguably the most important thing that has happened to SQL in the last
20 years — bringing it out of DDL into the query.  So I’m inclined to add this construct
to Calcite’s core SQL grammar, and if you don’t like it, don’t use it, and it won’t
get in your way.

> We choose to use the existing CONVERT_FROM from sql to solve the datatype
> problem.  It's a bit verbose but avoids introducing new concepts to
> existing sql.  To simplify it, I've considered the other casting form which
> I believe is similar to the Java casting prefix. It would then be:
> SELECT (varchar) col1 FROM t1
> This doesn't distinguish between dynamic and existing columns.  We've shown
> that you don't need to do that but that is pretty complicated to do. If you
> wanted to avoid that,  maybe you could prefix the type. So a mix of dynamic
> and known might look like this :
> SELECT (?varchar) col1, col2 FROM t1

I don’t know CONVERT_FROM. Did you consider using CAST? E.g. CAST(x AS VARCHAR(10)).

Cast is very flexible, and unlike Java casts, it does conversions specific to source and target
type. In Java 

Object o = new Double(3.1415);
System.out.println((String) o));

will throw ClassCastException, but SQL

select cast(o as VARCHAR(10)) from (values (3.1415)) as t(o);

will print ‘3.1415’.

> That being said, I've worked with the parser and can guess the pain of
> trying to add it as optional to the existing select list parsing.
> Especially given scalar sub queries.

If you’re talking about the Java-style casting syntax with parentheses. Let’s just not
go there. It doesn’t belong in SQL any more than && for AND.

> If merged (even in parser definition files) I would definitely want it to
> be optional as it is.  That namespace is already cluttered and I think we
> should only merge new language features into core if they are sql spec or
> already used by the majority of projects. What about a language features
> contrib module?

My philosophy is to stick with the standard almost all of the time. I justify going beyond
the standard if there is no syntactic class with standard SQL and it is an area not addressed
by the standard. In my opinion this feature — and by the way JSON literals — falls into
this category.

Look at some of the other JIRA cases tagged “phoenix” and you will see that I am suggesting
putting those into a Phoenix-specific sub-class of the parser, because they don’t meet that
high standard.

View raw message