drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mehant Baid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2504) Aggregate query with grouping results in Error
Date Tue, 21 Apr 2015 22:12:59 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mehant Baid updated DRILL-2504:
-------------------------------
    Assignee: Venki Korukanti  (was: Mehant Baid)

> Aggregate query with grouping results in Error
> ----------------------------------------------
>
>                 Key: DRILL-2504
>                 URL: https://issues.apache.org/jira/browse/DRILL-2504
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>         Environment: 4 node cluster
> {code}
> 0: jdbc:drill:> select * from sys.version;
> +------------+----------------+-------------+-------------+------------+
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +------------+----------------+-------------+-------------+------------+
> | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert ProjectOperator with MuxExchange
| 09.03.2015 @ 01:49:18 EDT | Unknown     | 09.03.2015 @ 04:52:49 EDT |
> +------------+----------------+-------------+-------------+------------+
> 1 row selected (0.062 seconds)
> {code}
>            Reporter: Khurram Faraaz
>            Assignee: Venki Korukanti
>             Fix For: 1.0.0
>
>
> The below aggregate query with group by over distinct/non-distinct data results in an
Exception. Please note that I had set enable_hashagg=false and I was querying from a CSV file.
Query was run over a four node cluster.
> alter system set `planner.enable_hashagg`=true;
> alter session set `planner.enable_hashagg`=true;
> {code}
> 0: jdbc:drill:> alter system set `planner.enable_hashagg`=false;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | planner.enable_hashagg updated. |
> +------------+------------+
> 1 row selected (0.075 seconds)
> 0: jdbc:drill:> select columns[4], sum(columns[0]), count(distinct columns[1]), max(columns[2]),
count(distinct columns[3]), max(columns[5]), min(columns[6]), avg(columns[7])
> . . . . . . . > from `conftest.csv`
> . . . . . . . > group by columns[4];
> Query failed: Query stopped., Failure while trying to materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--.. [ 6cd09ba7-3e4b-4b3b-b111-39f74f53e1b0 on centos-01.qa.lab:31010
]
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> {code}
> {code}
> Stack trace from drillbit.log
> 2015-03-19 17:47:43,123 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter
- Error bab1babd-48fe-4719-8a77-dc5826027ba7: Failure while running fragment.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize
incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--..
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:390)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_75]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> 2015-03-19 17:47:43,124 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] INFO  o.a.drill.exec.work.foreman.Foreman
- State change requested.  RUNNING --> FAILED
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment., Failure
while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--.. [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010
]
> [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
>         at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:95)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:154)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:176)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:123)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_75]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> 2015-03-19 17:47:43,141 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] INFO  o.a.drill.exec.work.foreman.Foreman
- State change requested.  FAILED --> COMPLETED
> 2015-03-19 17:47:43,141 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] WARN  o.a.drill.exec.work.foreman.Foreman
- Dropping request to move to COMPLETED state as query is already at FAILED state (which is
terminal).
> 2015-03-19 17:47:43,142 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] ERROR o.a.drill.exec.work.foreman.Foreman
- Error 82bd01dc-d27d-47e4-836d-426f3625511f: RemoteRpcException: Failure while running fragment.,
Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--.. [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010
]
> [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment., Failure
while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--.. [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010
]
> [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:95)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:154)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:176)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:123)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_75]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> {code}
> Physical plan for the failing query
> {code}
> 00-00    Screen : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3,
BIGINT EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {832.95
rows, 13844.73454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2049
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$4=[$4], EXPR$5=[$5],
EXPR$6=[$6], EXPR$7=[$7]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY
EXPR$3, BIGINT EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost
= {830.0 rows, 13841.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id =
2048
> 00-02        Project(EXPR$0=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], EXPR$2=[$11], EXPR$3=[$3],
EXPR$4=[$9], EXPR$5=[$4], EXPR$6=[$5], EXPR$7=[CAST(/(CastHigh(CASE(=($7, 0), null, $6)),
$7)):ANY]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3, BIGINT
EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {800.5 rows,
13809.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2047
> 00-03          MergeJoin(condition=[IS NOT DISTINCT FROM($0, $10)], joinType=[inner])
: rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY EXPR$3, ANY EXPR$5, ANY EXPR$6,
ANY $f6, BIGINT $f7, ANY EXPR$00, BIGINT EXPR$4, ANY EXPR$01, BIGINT EXPR$2): rowcount = 29.5,
cumulative cost = {771.0 rows, 13777.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999
memory}, id = 2046
> 00-05            MergeJoin(condition=[IS NOT DISTINCT FROM($0, $8)], joinType=[inner])
: rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY EXPR$3, ANY EXPR$5, ANY EXPR$6,
ANY $f6, BIGINT $f7, ANY EXPR$00, BIGINT EXPR$4): rowcount = 29.5, cumulative cost = {491.70000000000005
rows, 10177.344151873782 cpu, 0.0 io, 0.0 network, 4814.4 memory}, id = 2038
> 00-08              StreamAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], EXPR$3=[MAX($3)],
EXPR$5=[MAX($5)], EXPR$6=[MIN($6)], agg#5=[$SUM0($7)], agg#6=[COUNT($7)]) : rowType = RecordType(ANY
EXPR$0, ANY $f1, BIGINT $f2, ANY EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7): rowcount
= 5.9, cumulative cost = {236.0 rows, 6671.303759649394 cpu, 0.0 io, 0.0 network, 3776.0 memory},
id = 2030
> 00-11                Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY EXPR$0,
ANY $f1, ANY $f2, ANY $f3, ANY $f4, ANY $f5, ANY $f6, ANY $f7): rowcount = 59.0, cumulative
cost = {177.0 rows, 1479.3037596493946 cpu, 0.0 io, 0.0 network, 3776.0 memory}, id = 2029
> 00-14                  Project(EXPR$0=[ITEM($0, 4)], $f1=[ITEM($0, 0)], $f2=[ITEM($0,
1)], $f3=[ITEM($0, 2)], $f4=[ITEM($0, 3)], $f5=[ITEM($0, 5)], $f6=[ITEM($0, 6)], $f7=[ITEM($0,
7)]) : rowType = RecordType(ANY EXPR$0, ANY $f1, ANY $f2, ANY $f3, ANY $f4, ANY $f5, ANY $f6,
ANY $f7): rowcount = 59.0, cumulative cost = {118.0 rows, 91.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 2028
> 00-17                    Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/conftest.csv,
numFiles=1, columns=[`columns`[4], `columns`[0], `columns`[1], `columns`[2], `columns`[3],
`columns`[5], `columns`[6], `columns`[7]], files=[maprfs:/tmp/conftest.csv]]]) : rowType =
RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 2027
> 00-07              Project(EXPR$00=[$0], EXPR$4=[$1]) : rowType = RecordType(ANY EXPR$00,
BIGINT EXPR$4): rowcount = 1.0, cumulative cost = {248.8 rows, 3478.440392224387 cpu, 0.0
io, 0.0 network, 1038.4 memory}, id = 2037
> 00-10                StreamAgg(group=[{0}], EXPR$4=[COUNT($1)]) : rowType = RecordType(ANY
EXPR$0, BIGINT EXPR$4): rowcount = 1.0, cumulative cost = {247.8 rows, 3470.440392224387 cpu,
0.0 io, 0.0 network, 1038.4 memory}, id = 2036
> 00-13                  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY EXPR$0,
ANY $f4): rowcount = 5.9, cumulative cost = {241.9 rows, 3376.040392224387 cpu, 0.0 io, 0.0
network, 1038.4 memory}, id = 2035
> 00-16                    StreamAgg(group=[{0, 1}]) : rowType = RecordType(ANY EXPR$0,
ANY $f4): rowcount = 5.9, cumulative cost = {236.0 rows, 3315.607519298789 cpu, 0.0 io, 0.0
network, 944.0 memory}, id = 2034
> 00-19                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) : rowType
= RecordType(ANY EXPR$0, ANY $f4): rowcount = 59.0, cumulative cost = {177.0 rows, 2843.607519298789
cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2033
> 00-21                        Project(EXPR$0=[ITEM($0, 4)], $f4=[ITEM($0, 3)]) : rowType
= RecordType(ANY EXPR$0, ANY $f4): rowcount = 59.0, cumulative cost = {118.0 rows, 67.0 cpu,
0.0 io, 0.0 network, 0.0 memory}, id = 2032
> 00-22                          Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/conftest.csv,
numFiles=1, columns=[`columns`[4], `columns`[3]], files=[maprfs:/tmp/conftest.csv]]]) : rowType
= RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 2031
> 00-04            Project(EXPR$01=[$0], EXPR$2=[$1]) : rowType = RecordType(ANY EXPR$01,
BIGINT EXPR$2): rowcount = 1.0, cumulative cost = {248.8 rows, 3478.440392224387 cpu, 0.0
io, 0.0 network, 1038.4 memory}, id = 2045
> 00-06              StreamAgg(group=[{0}], EXPR$2=[COUNT($1)]) : rowType = RecordType(ANY
EXPR$0, BIGINT EXPR$2): rowcount = 1.0, cumulative cost = {247.8 rows, 3470.440392224387 cpu,
0.0 io, 0.0 network, 1038.4 memory}, id = 2044
> 00-09                Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY EXPR$0,
ANY $f2): rowcount = 5.9, cumulative cost = {241.9 rows, 3376.040392224387 cpu, 0.0 io, 0.0
network, 1038.4 memory}, id = 2043
> 00-12                  StreamAgg(group=[{0, 1}]) : rowType = RecordType(ANY EXPR$0, ANY
$f2): rowcount = 5.9, cumulative cost = {236.0 rows, 3315.607519298789 cpu, 0.0 io, 0.0 network,
944.0 memory}, id = 2042
> 00-15                    Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) : rowType
= RecordType(ANY EXPR$0, ANY $f2): rowcount = 59.0, cumulative cost = {177.0 rows, 2843.607519298789
cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2041
> 00-18                      Project(EXPR$0=[ITEM($0, 4)], $f2=[ITEM($0, 1)]) : rowType
= RecordType(ANY EXPR$0, ANY $f2): rowcount = 59.0, cumulative cost = {118.0 rows, 67.0 cpu,
0.0 io, 0.0 network, 0.0 memory}, id = 2040
> 00-20                        Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/conftest.csv,
numFiles=1, columns=[`columns`[4], `columns`[1]], files=[maprfs:/tmp/conftest.csv]]]) : rowType
= RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 2039
> {code}
> Query over view returns correct results
> {code}
> 0: jdbc:drill:> create view v8(col_decimal, col_int, col_bigint, col_char, col_varchar,
col_timestmp, col_float, col_double)
> . . . . . . . > as
> . . . . . . . > select cast(columns[0] as decimal(11,3)),
> . . . . . . . > cast(columns[1] as int),
> . . . . . . . > cast(columns[2] as bigint),
> . . . . . . . > cast(columns[3] as char(9)),
> . . . . . . . > cast(columns[4] as varchar(256)),
> . . . . . . . > cast(columns[5] as timestamp),
> . . . . . . . > cast(columns[6] as float),
> . . . . . . . > cast(columns[7] as double)
> . . . . . . . > from `conftest.csv`;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | View 'v8' created successfully in 'dfs.tmp' schema |
> +------------+------------+
> 1 row selected (0.102 seconds)
> 0: jdbc:drill:> select col_varchar, sum(col_decimal), count(distinct col_int)
> . . . . . . . > from v8
> . . . . . . . > group by col_varchar;
> 209 rows selected (0.713 seconds)
> {code}
> 1. Original query over the VIEW returns correct results.
> 2. Original query over the CSV file, and with casting in the select statement on csv
data, also returned correct results.
> So the problem seems to be when we run original aggregate+grouping query over CSV file,
when we do not explicitly cast the columns.
> {code}
> 0: jdbc:drill:> select col_varchar, sum(col_decimal), count(distinct col_int), max(col_bigint),
count(distinct col_char), max(col_timestmp), min(col_float), avg(col_double)
> . . . . . . . > from v8
> . . . . . . . > group by col_varchar;
> +-------------+------------+------------+------------+------------+------------+------------+------------+
> | col_varchar |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6
  |   EXPR$7   |
> +-------------+------------+------------+------------+------------+------------+------------+------------+
> 209 rows selected (1.286 seconds)
> The below query returns correct results
> 0: jdbc:drill:> select columns[4], sum(cast(columns[0] as decimal(11,3))), count(distinct
cast(columns[1] as int)), max(cast(columns[2] as bigint)), count(distinct cast(columns[3]
as char(9))), max(cast(columns[5] as timestamp)), min(cast(columns[6] as float)), avg(cast(columns[7]
as double))
> . . . . . . . > from `conftest.csv`
> . . . . . . . > group by columns[4];
> 209 rows selected (1.129 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message