drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5329) External sort does not support "obscure" data types
Date Mon, 20 Mar 2017 22:41:41 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers updated DRILL-5329:
-------------------------------
    Description: 
A unit test was created to exercise the "Sorter" mechanism within the External Sort, which
is used to sort each incoming batch. The sorter was tested with each Drill data type.

The following types fail:

* TINYINT
* UINT1
* SMALLINT
* UINT2
* UINT4
* UINT8
* VAR16CHAR
* DECIMAL28SPARSE
* DECIMAL38SPARSE

The types that work include:

* INT
* BIGINT
* FLOAT4
* FLOAT8
* DECIMAL9
* DECIMAL18
* VARCHAR
* VARBINARY
* DATE
* TIME
* TIMESTAMP
* INTERVAL
* INTERVALDAY
* INTERVALYEAR

Could not find a way to test the following:

* DECIMAL28DENSE
* DECIMAL38DENSE
* LIST
* MAP
* GENERIC_OBJECT
* UNION

Not yet supported in Drill:

* MONEY
* FIXEDCHAR
* FIXED16CHAR
* FIXEDBINARY
* NULL
* TIMETZ
* TIMESTAMPTZ
* LATE

The failure manifests on one of two ways:

* If dynamic UDFs are enabled, the query crashes with an NPE. (See DRILL-5331.)
* If dynamic UDFs are disabled, the generated code silently skips the comparison step, resulting
in the sort not actually being done:

Sorting a set of 20-pseudo-random rows produces the following output:

{code}
#, row #, key, value
0(0): 11, "0"
1(1): 14, "1"
2(2): 17, "2"
3(3): 0, "3"
{code}

By contrast, the (working) Int type produces the correct results:

{code}
#, row #, key, value
0(3): 0, "3"
1(10): 1, "10"
2(17): 2, "17"
3(4): 3, "4"
{code}

The first number is the row index, the second is the row pointed to by the sv2 (which should
be written to create sort order). Sort was done ASC, NULLS_HIGH, by the key field.

A strong concern here is that there is no error or other warning to the user that Drill cannot
sort this type; Drill just silently declines to perform the operation.

  was:
A unit test was created to exercise the "Sorter" mechanism within the External Sort, which
is used to sort each incoming batch. The sorter was tested with each Drill data type.

The following types fail:

* TINYINT
* UINT1
* SMALLINT
* UINT2
* UINT4
* UINT8
* VAR16CHAR
* DECIMAL28SPARSE
* DECIMAL38SPARSE

The types that work include:

* INT
* BIGINT
* FLOAT4
* FLOAT8
* DECIMAL9
* DECIMAL18
* VARCHAR
* VARBINARY
* DATE
* TIME
* TIMESTAMP
* INTERVALYEAR

Could not find a way to test the following:

* DECIMAL28DENSE
* DECIMAL38DENSE
* LIST
* GENERIC_OBJECT
* UNION
* INTERVAL
* INTERVALDAY

Not yet supported in Drill:

* MONEY
* FIXEDCHAR
* FIXED16CHAR
* FIXEDBINARY
* NULL
* TIMETZ
* TIMESTAMPTZ
* LATE

The failure manifests on one of two ways:

* If dynamic UDFs are enabled, the query crashes with an NPE. (See DRILL-5331.)
* If dynamic UDFs are disabled, the generated code silently skips the comparison step, resulting
in the sort not actually being done:

Sorting a set of 20-pseudo-random rows produces the following output:

{code}
#, row #, key, value
0(0): 11, "0"
1(1): 14, "1"
2(2): 17, "2"
3(3): 0, "3"
{code}

By contrast, the (working) Int type produces the correct results:

{code}
#, row #, key, value
0(3): 0, "3"
1(10): 1, "10"
2(17): 2, "17"
3(4): 3, "4"
{code}

The first number is the row index, the second is the row pointed to by the sv2 (which should
be written to create sort order). Sort was done ASC, NULLS_HIGH, by the key field.

A strong concern here is that there is no error or other warning to the user that Drill cannot
sort this type; Drill just silently declines to perform the operation.


> External sort does not support "obscure" data types
> ---------------------------------------------------
>
>                 Key: DRILL-5329
>                 URL: https://issues.apache.org/jira/browse/DRILL-5329
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>
> A unit test was created to exercise the "Sorter" mechanism within the External Sort,
which is used to sort each incoming batch. The sorter was tested with each Drill data type.
> The following types fail:
> * TINYINT
> * UINT1
> * SMALLINT
> * UINT2
> * UINT4
> * UINT8
> * VAR16CHAR
> * DECIMAL28SPARSE
> * DECIMAL38SPARSE
> The types that work include:
> * INT
> * BIGINT
> * FLOAT4
> * FLOAT8
> * DECIMAL9
> * DECIMAL18
> * VARCHAR
> * VARBINARY
> * DATE
> * TIME
> * TIMESTAMP
> * INTERVAL
> * INTERVALDAY
> * INTERVALYEAR
> Could not find a way to test the following:
> * DECIMAL28DENSE
> * DECIMAL38DENSE
> * LIST
> * MAP
> * GENERIC_OBJECT
> * UNION
> Not yet supported in Drill:
> * MONEY
> * FIXEDCHAR
> * FIXED16CHAR
> * FIXEDBINARY
> * NULL
> * TIMETZ
> * TIMESTAMPTZ
> * LATE
> The failure manifests on one of two ways:
> * If dynamic UDFs are enabled, the query crashes with an NPE. (See DRILL-5331.)
> * If dynamic UDFs are disabled, the generated code silently skips the comparison step,
resulting in the sort not actually being done:
> Sorting a set of 20-pseudo-random rows produces the following output:
> {code}
> #, row #, key, value
> 0(0): 11, "0"
> 1(1): 14, "1"
> 2(2): 17, "2"
> 3(3): 0, "3"
> {code}
> By contrast, the (working) Int type produces the correct results:
> {code}
> #, row #, key, value
> 0(3): 0, "3"
> 1(10): 1, "10"
> 2(17): 2, "17"
> 3(4): 3, "4"
> {code}
> The first number is the row index, the second is the row pointed to by the sv2 (which
should be written to create sort order). Sort was done ASC, NULLS_HIGH, by the key field.
> A strong concern here is that there is no error or other warning to the user that Drill
cannot sort this type; Drill just silently declines to perform the operation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message