drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6154) NaN, Infinity issues
Date Sat, 24 Feb 2018 01:16:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375190#comment-16375190

ASF GitHub Bot commented on DRILL-6154:

Github user asfgit closed the pull request at:


> NaN, Infinity issues
> --------------------
>                 Key: DRILL-6154
>                 URL: https://issues.apache.org/jira/browse/DRILL-6154
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Volodymyr Tkach
>            Assignee: Volodymyr Tkach
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.13.0
> 1.Issue
>  - *sqrt*
>  - *trunc*
> *ISSUE_DESCRIPTION:* According to DRILL-5919/ new json number literals were added: *NaN,
Infinity, -Infinity*. The new data types must be processed properly by existing functions.
There are a few issues:
>  *1. SQRT function*. Run the following test query:
> {code:java}
> select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, sqrt(Negative_Infinity) NEG_INF
from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result: _NaN, Infinity, NaN_
(expected result is based on java Math.sqrt() method)
>  - ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
> *2. TRUNC function*. According to DRILL docs ([https://drill.apache.org/docs/math-and-trig/):]
_TRUNC(x, y) : Truncates x to y decimal places. *Specifying y is optional. Default is 1*_.
So, the function must work properly without specifying *y*
>  However an error message appears. Run test_query:
> {code:java}
> select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, trunc(Negative_Infinity) NEG_INF
from dfs.tmp.`PN_Inf_NaN.json`{code}
>  - EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, NaN*
>  - ACTUAL_RESULT: it appears the following error message: *Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException
Fragment 0:0 [Error Id: 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
> Please investigate and fix, test file attached PN_Inf_NaN.json
> 2. Issue 
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: *NaN, Infinity,
>  During testing activities, it was detected a bit strange behavior of INNER JOIN operator
- different query results in almost the same queries. 
>  *Query1*
> {code:java}
>  select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join dfs.tmp.`ObjsX.json`
tt on t.attr4 = tt.attr4 {code}
> *Query2*
> {code:java}
>  select distinct t.name from dfs.tmp.`ObjsX.json` t inner join dfs.tmp.`ObjsX.json` tt
on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
>  - In *Query1* - 2 columns are selected - t.name, tt.name
>  - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
>  - *Query1* returns
> {code:java}
> 	name         name0
> 	object2 	object2
> 	object2 	object3
> 	object2 	object4
> 	object3 	object2
> 	object3 	object3
> 	object3 	object4
> 	object4 	object2
> 	object4 	object3
> 	object4 	object4
> 	{code}
> This result seems to be correct.
>  - *Query2* returns _*No result found*_, not expected:
> {code:java}
> 	name
> 	object2
> 	object3
> 	object4
> 	{code}
> {code:java}
> No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which contains newly-added
data types (NaN, Infinity, -Infinity)*_. The issue is not reproducible is a user is JOINing
tables by a column containing other data types
> 3. Issue
> *ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when sorting columns
containing NaN values. In one case it considers NaN to be the largest value, in another -
the smallest one. 
>  *Steps:*
>  - Select from the attached test file (orderBy.json, attached)
> {code:java}
> SELECT name, attr4 from dfs.tmp.`orderBy.json` order by name, attr4{code}
>  - Check the attached screen shot (orderByIssue.jpg):
>  *EXPECTED_RESULT:* It was expected the 'ORDER BY' clause to sort attr4 columns data
in the same way (most probably NaN should be the largest, see *NB*)
>  *ACTUAL_RESULT:* attr4 column's values were sorted in different ways: for 'obj1'/'obj3'
NaN is the largest, for 'obj2'/'obj4' NaN is the smallest.
> *NB:* Postgres as well as Java's sorting (Collection.sort() / Arrays.sort() methods)
treats NaN as the largest value
> 4. Issue
> *AFFECTED_FUNCTIONALITY:* max(column_name), min(column_name),
> *ISSUE_DESCRIPTION:* min/max aggregation functions return the same result if the selected
column contains NaN value.
> {code:java}
> SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by name{code}
> Result
> {code:java}
> name	Min	Max
> obj1	NaN	NaN
> obj2	NaN	NaN
> obj3	NaN	NaN
> obj4	NaN	NaN
> {code}
> As for me, this logic should be revised, current behavior is a bit confusing:
>  - Postgres considers NaN to be the largest value, so {{MAX(col_withNaN)}} will return
NaN, {{MIN(col_withNaN)}} - will return other value, so DRILL's min/max logic can be adjusted
to the Postgres' one, see postgres.jpg
>  - Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs

This message was sent by Atlassian JIRA

View raw message