drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5329) External sort does not support "obscure" data types
Date Thu, 04 May 2017 16:32:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997031#comment-15997031
] 

Rahul Challapalli commented on DRILL-5329:
------------------------------------------

[~Paul.Rogers] Among the types which do not work, what types do you plan to intend? There
are parquet writers which generate parquet files with some of these types. One way would be
to say that we support any types that can be generated by hive & spark and work towards
that goal. Thoughts?

> External sort does not support "obscure" data types
> ---------------------------------------------------
>
>                 Key: DRILL-5329
>                 URL: https://issues.apache.org/jira/browse/DRILL-5329
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>
> A unit test was created to exercise the "Sorter" mechanism within the External Sort,
which is used to sort each incoming batch. The sorter was tested with each Drill data type.
> The following types fail:
> * TINYINT
> * UINT1
> * SMALLINT
> * UINT2
> * UINT4
> * UINT8
> * VAR16CHAR
> * DECIMAL28SPARSE
> * DECIMAL38SPARSE
> The types that work include:
> * INT
> * BIGINT
> * FLOAT4
> * FLOAT8
> * DECIMAL9
> * DECIMAL18
> * VARCHAR
> * VARBINARY
> * DATE
> * TIME
> * TIMESTAMP
> * INTERVAL
> * INTERVALDAY
> * INTERVALYEAR
> Could not find a way to test the following:
> * DECIMAL28DENSE
> * DECIMAL38DENSE
> * LIST
> * MAP
> * GENERIC_OBJECT
> * UNION
> Not yet supported in Drill:
> * MONEY
> * FIXEDCHAR
> * FIXED16CHAR
> * FIXEDBINARY
> * NULL
> * TIMETZ
> * TIMESTAMPTZ
> * LATE
> The failure manifests on one of two ways:
> * If dynamic UDFs are enabled, the query crashes with an NPE. (See DRILL-5331.)
> * If dynamic UDFs are disabled, the generated code silently skips the comparison step,
resulting in the sort not actually being done:
> Sorting a set of 20-pseudo-random rows produces the following output:
> {code}
> #, row #, key, value
> 0(0): 11, "0"
> 1(1): 14, "1"
> 2(2): 17, "2"
> 3(3): 0, "3"
> {code}
> By contrast, the (working) Int type produces the correct results:
> {code}
> #, row #, key, value
> 0(3): 0, "3"
> 1(10): 1, "10"
> 2(17): 2, "17"
> 3(4): 3, "4"
> {code}
> The first number is the row index, the second is the row pointed to by the sv2 (which
should be written to create sort order). Sort was done ASC, NULLS_HIGH, by the key field.
> A strong concern here is that there is no error or other warning to the user that Drill
cannot sort this type; Drill just silently declines to perform the operation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message