drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2807) Revisit SQL array indexing _before GA/1.0_ (non-std. zero-based vs. std. one-based)
Date Thu, 16 Apr 2015 23:07:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498940#comment-14498940

Jacques Nadeau commented on DRILL-2807:

[~dsbos], thanks for sharing your thoughts.  It is always interesting to hear new perspectives
on what we're doing on Drill.  I think your proposal has a narrower definition of consistency
than what I would suggest.

A few things that I consider:

One of Drill's goals has always been to be very much in alignment with Javascript and JSON.
 That is why our configuration files are JSON, our plans are JSON, our profiles are JSON and
we're the first big data engine that is designed to work with all the things that come from
that.  Consistency of data model paradigm aligns directly with JSON and Javascript, a far
richer and covering set over SQL.

Complex tree traversal is a beast unto itself.  In this space, the SQL community's existing
responses have seen poor adoption to newer models like those provided by tools like MongoDB
and Couchbase.   Thus consistency with successful companion technologies supports a JSON like
approach.  (Remember, we even have a native MongoDB connector.)

There are really two common complex object traversal models that are commonly used today.
 The first is xpath and the second is javascript/json.  A SQL identifier's schema.table.column
actually matches directly with the map reference model of JSON and thus continuing to match
that pattern makes sense.  Consistency with a popular object traversal patterns is very important.

We made the design decision early on that we would not even attempt to follow or support the
SQL specification with regards to complex data. JSON and Javascript are much more in alignment
with our goals.  Consistency with what we communicated with the world is important to building
new user trust.

We haven't had a single piece of feedback from users that this was confusing or an issue.
 Despite us still being 1.0, we have a large number of users who have already been trained
on the current approach.  Consistency with what our users already leverage is probably the
most important thing we must maintain.

So in my mind, the current approach strongly supports consistency.  

I'm very comfortable with the current strategy and am -1 on a change.

> Revisit SQL array indexing _before GA/1.0_ (non-std. zero-based vs. std. one-based)
> -----------------------------------------------------------------------------------
>                 Key: DRILL-2807
>                 URL: https://issues.apache.org/jira/browse/DRILL-2807
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Daniel Barclay (Drill)
> Currently, Drill's SQL uses zero-based indexing for arrays.  ({{A\[0]}} gets the first
element of the array value of {{A}}.)
> This is not compatible with standard SQL, which uses one-based indexing for arrays. 
({{A\[1]}} gets the first element of the array value of {{A}}.)
> &nbsp;
> Most probably, Drill should use standard one-base indexing.
> More definitely, this should be revisited before 1.0/GA, because it will be much harder
to change the index origin later.
> (Even though it could be made changeable later, that would still be confusing--the meaning
of an indexed array expression wouldn't be clear without knowledge of the current setting.)
> &nbsp;
> Some arguments directly for using standard SQL one-based array indexing are:
> 1. The SQL standard specifies that array indexing is one-based, so Drill should use one-based
indexing.  \\  \\ This is true especially because Drill is intended to be compatible with
standard SQL rather than having its own dialect of SQL (other than where necessary because
of the essentially different nature of Drill).
> 2. Since standard SQL's array indexing is one-based, users familiar with arrays will
expect one-based indexing. Drill shouldn't surprise users, or make them have to remember another
difference between Drill and standard SQL.
> 3. Even users unfamiliar with SQL arrays will expect their indexing to be  one-based
since the rest of standard SQL also uses one-based indexing:
>   - Column numbers are one-based:
>     -- in "ORDER BY 1"
>   - Character and binary string positions are one-based
>     -- in definition of SUBSTRING
>     -- in "If <start position> is not specified, then 1 (one) is implicit."
> &nbsp;
> The reason that Drill's SQL array indexing currently is zero-based is that previously
it was thought that it would be good to be  consistent with indexing in JavaScript, which
is expected to be used by a significant number of Drill users to process JSON output from
or input to Drill.
> However, besides the direct arguments above, there are additional arguments for not making
Drill's SQL non-standard in order to try to be consistent with JavaScript:
> 1. SQL simply isn't JavaScript, just as it isn't Java, C, C++, or any other zero-based
language that can be used to process data read into or written from Drill. \\ \\ (One could
argue that SQL _should_ have been designed to be zero-based in the first place, but it wasn't,
and so now it simply isn't.)
> 2. Drill's SQL doesn't forgo compatibility with the SQL standard to make other things
zero-based in order to try to be consistent with JavaScript:
>   - It doesn't make string index zero- based (in {{SUBSTRING}}, etc.).
>   - It doesn't make column indexes zero-based (e.g., changing standard "{{ORDER BY 1}}"
to non-standard "{{ORDER BY 0}}").
> 3. If Drill were defining a JavaScript API for accessing SQL-related data from JavaScript
(as JDBC provides access from Java), then the index origin used in method parameters in that
API could be zero-based:
>   - It could be chosen to be zero-based to be consistent with JavaScript rather than
being consistent with SQL.
>   - It could be chosen to be one-based to be consistent with SQL rather than being consistent
with JavaScript.  However, note that it could still take and return JavaScript objects, which
would still use JavaScript's zero-based indexing--just as JDBC uses SQL-consistent one-based
indexing in its  methods, but also returns Java objects which of course are indexed in Java's
zero-based way.
>   - (More JDBC/Java details:
>     -- is one-based in SQL part (JDBC Java objects representing SQL concepts):
>       --- column numbers (e.g., {{resultSet.getDate(1)}} for first column)
>       --- row numbers (e.g., {{resultSet.absolute(1)}} for first now) 
>       --- element numbers (e.g., {{array.getArray(1, 1)}} for first element, {{Array.getResultSet(...)}}'s
element numbers)
>       --- character/byte numbers in Clob/Blob (e.g., {{getSubString(1, ...)}} for first,
{{position(...)}}, {{getBytes(1...)}} starts at first)
>     -- is zero-based once back at pure-Java level (non-JDBC objects representing values):
>       --- {{resultSet.getString(...)}}, then {{.charAt(0)}} on string
>       --- {{resultSet.getBytes(...)}}, then {{\[0]}} on byte array
>       --- {{resultSet.getArray(...)}}, then {{.getArray(...)}}, then {{\[0]}} on array
>     -- So, if you want to use Java's zero-based indexing, you retrieve data into generic
Java types and then do so.  Java's SQL API (JDBC) doesn't do everything zero-based even though
the host language is zero-based.
> )
> However, making the JavaScript _API_'s indexing consistent with JavaScript still does
not imply having Drill use non-standard zero-based indexing in Drill's _SQL_.

This message was sent by Atlassian JIRA

View raw message