From issues-return-176079-archive-asf-public=cust-asf.ponee.io@hive.apache.org Thu Jan 9 08:04:04 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EE32718063F for ; Thu, 9 Jan 2020 09:04:03 +0100 (CET) Received: (qmail 87436 invoked by uid 500); 9 Jan 2020 08:04:03 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 87427 invoked by uid 99); 9 Jan 2020 08:04:03 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2020 08:04:03 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 99A16E30CF for ; Thu, 9 Jan 2020 08:04:01 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 5924A780868 for ; Thu, 9 Jan 2020 08:04:00 +0000 (UTC) Date: Thu, 9 Jan 2020 08:04:00 +0000 (UTC) From: "Krisztian Kasa (Jira)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-22489) Reduce Sink operator should order nulls by parameter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-22489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-22489: ---------------------------------- Status: Open (was: Patch Available) > Reduce Sink operator should order nulls by parameter > ----------------------------------------------------- > > Key: HIVE-22489 > URL: https://issues.apache.org/jira/browse/HIVE-22489 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > Attachments: HIVE-22489.1.patch, HIVE-22489.10.patch, HIVE-22489.10.patch, HIVE-22489.2.patch, HIVE-22489.3.patch, HIVE-22489.3.patch, HIVE-22489.4.patch, HIVE-22489.5.patch, HIVE-22489.6.patch, HIVE-22489.7.patch, HIVE-22489.8.patch, HIVE-22489.9.patch, HIVE-22489.9.patch > > > When the property hive.default.nulls.last is set to true and no null order is explicitly specified in the ORDER BY clause of the query null ordering should be NULLS LAST. > But some of the Reduce Sink operators still orders null first. > {code} > SET hive.default.nulls.last=true; > EXPLAIN EXTENDED > SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key LIMIT 5; > {code} > {code} > PREHOOK: query: EXPLAIN EXTENDED > SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key > PREHOOK: type: QUERY > PREHOOK: Input: default@src > #### A masked pattern was here #### > POSTHOOK: query: EXPLAIN EXTENDED > SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key > POSTHOOK: type: QUERY > POSTHOOK: Input: default@src > #### A masked pattern was here #### > OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value` > FROM (SELECT `key` > FROM `default`.`src` > WHERE `key` IS NOT NULL) AS `t0` > INNER JOIN (SELECT `key`, `value` > FROM `default`.`src` > WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key` > ORDER BY `t0`.`key` > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > #### A masked pattern was here #### > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: src1 > filterExpr: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE > GatherStats: false > Filter Operator > isSamplingPred: false > predicate: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: key (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE > tag: 0 > auto parallelism: true > Execution mode: vectorized, llap > LLAP IO: no inputs > Path -> Alias: > #### A masked pattern was here #### > Path -> Partition: > #### A masked pattern was here #### > Partition > base file name: src > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: default.src > name: default.src > Truncated Path -> Alias: > /src [src1] > Map 4 > Map Operator Tree: > TableScan > alias: src2 > filterExpr: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE > GatherStats: false > Filter Operator > isSamplingPred: false > predicate: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: key (type: string), value (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE > tag: 1 > value expressions: _col1 (type: string) > auto parallelism: true > Execution mode: vectorized, llap > LLAP IO: no inputs > Path -> Alias: > #### A masked pattern was here #### > Path -> Partition: > #### A masked pattern was here #### > Partition > base file name: src > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: default.src > name: default.src > Truncated Path -> Alias: > /src [src2] > Reducer 2 > Execution mode: llap > Needs Tagging: false > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: string) > 1 _col0 (type: string) > outputColumnNames: _col0, _col2 > Position of Big Table: 1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: string), _col2 (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > null sort order: z > sort order: + > Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE > tag: -1 > value expressions: _col1 (type: string) > auto parallelism: false > Reducer 3 > Execution mode: vectorized, llap > Needs Tagging: false > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > GlobalTableId: 0 > #### A masked pattern was here #### > NumFilesPerFileSink: 1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE > #### A masked pattern was here #### > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > properties: > columns _col0,_col1 > columns.types string:string > escape.delim \ > hive.serialization.extend.additional.nesting.levels true > serialization.escape.crlf true > serialization.format 1 > serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > TotalFiles: 1 > GatherStats: false > MultiFileSpray: false > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)