drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5562) Vector types IntervalYear, IntervalDay and Interval are of the wrong width
Date Mon, 05 Jun 2017 00:56:04 GMT
Paul Rogers created DRILL-5562:
----------------------------------

             Summary: Vector types IntervalYear, IntervalDay and Interval are of the wrong
width
                 Key: DRILL-5562
                 URL: https://issues.apache.org/jira/browse/DRILL-5562
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


Drill provides three interval types, described in `ValueVectorTypes.tdd`:

* `IntervalYear`: a duration in months (sic)
* `IntervalDay`: a duration in days and ms.
* `Interval`: a duration in months, days and ms.

The file defines the width of each "field" (ms, days, months) as an int: 4 bytes. But, the
total vector width is wrong:

* `IntervalYear`: 8 bytes (should be 4: for months)
* `IntervalDay`: 12 bytes (should be 8: for days and ms.)
* `Interval`: 16 bytes (should be 12: for months, days and ms.)

It could be that the extra 4 bytes is supposed to be for a time zone. But, time zones don't
apply to intervals: an hour is the same duration everywhere on earth.

Since an interval does not contain a point in time, a time-zone is not useful even for daylight
savings time adjustments.

The code for each type reflects the "missing" 4 bytes. For example, for the 12-byte `IntervalDay`
vector:

{code}
    public void set(int index, int days, int milliseconds) {
      final int offsetIndex = index * VALUE_WIDTH;
      data.setInt(offsetIndex, days);
      data.setInt((offsetIndex + 4), milliseconds);
    }
{code}

Note also that the Drill IntervalDay need not be two fields wide. Except on a leap second,
a day has a fixed number of milliseconds. And, the only way to compensate for a leap second
is to know a point in time, which the interval does not have. Even if measured across a leap
second, an interval of a minute is always 60 seconds. It is only when doing:

{code}
end date/time = start date/time + interval
{code}

is the leap second even needed.

Although the ISO format expresses intervals as a tuple of (year, month, day, hour, minute,
second), the same value can be expressed as (months, ms) (with the proper conversions), so
Drill's interval types need only be 4 and 8 bytes wide.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message