Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8E2F3200498 for ; Tue, 29 Aug 2017 11:55:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8C9EF16654C; Tue, 29 Aug 2017 09:55:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DB5E0166549 for ; Tue, 29 Aug 2017 11:55:03 +0200 (CEST) Received: (qmail 46371 invoked by uid 500); 29 Aug 2017 09:55:02 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 46362 invoked by uid 99); 29 Aug 2017 09:55:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2017 09:55:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 48E2DCC30D for ; Tue, 29 Aug 2017 09:55:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id KqeZzwnldaPi for ; Tue, 29 Aug 2017 09:55:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 490C060E39 for ; Tue, 29 Aug 2017 09:55:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9D00FE0DF4 for ; Tue, 29 Aug 2017 09:55:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3ABF224147 for ; Tue, 29 Aug 2017 09:55:00 +0000 (UTC) Date: Tue, 29 Aug 2017 09:55:00 +0000 (UTC) From: "Zhizhen Hou (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 29 Aug 2017 09:55:04 -0000 [ https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhizhen Hou updated HIVE-16332: ------------------------------- Status: Patch Available (was: In Progress) IMHO, the ArrayList.ensureCapacity does not clear all the data of previous row. When the size of array of current row is less than that of previous row, it data of list will not be fully overwrite and the not overwrite data will be output. > When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error. > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-16332 > URL: https://issues.apache.org/jira/browse/HIVE-16332 > Project: Hive > Issue Type: Bug > Components: ORC > Affects Versions: 2.1.1 > Reporter: Zhizhen Hou > Assignee: Zhizhen Hou > Priority: Critical > Labels: patch > Attachments: HIVE-16332.1.patch > > > ##The step to reproduce the result. > 1. First crate a text format table with array type field in hive. > ``` > create table test_text_orc ( > col_int bigint, > col_text string, > col_array array, > col_map map > ) > PARTITIONED BY ( > day string > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > collection items TERMINATED BY ']' > map keys TERMINATED BY ':' > ; > > ``` > 2. Create new text file hive-orc-text-file-array-error-test.txt. > ``` > 1,text_value1,array_value1]array_value2]array_value3, map_key1:map_value1,map_key2:map_value2 > 2,text_value2,array_value4, map_key1:map_value3 > ,text_value3,, map_key1:]map_key3:map_value3 > ``` > 3. Load the data into one partition. > ``` > LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite into table test_text_orc partition(day=20170329) > ``` > 4. select the data to verify the result. > ``` > hive> select * from test.test_text_orc; > OK > 1 text_value1 ["array_value1","array_value2","array_value3"] {" map_key1":"map_value1","map_key2":"map_value2"} 20170329 > 2 text_value2 ["array_value4"] {"map_key1":"map_value3"} 20170329 > NULL text_value3 [] {" map_key1":"","map_key3":"map_value3"} 20170329 > ``` > 5. Alter table format of table to orc; > ``` > alter table test_text_orc set fileformat orc; > ``` > 6. Check the result again, and you can see the error result. > ``` > hive> select * from test.test_text_orc; > OK > 1 text_value1 ["array_value1","array_value2","array_value3"] {" map_key1":"map_value1","map_key2":"map_value2"} 20170329 > 2 text_value2 ["array_value4","array_value2","array_value3"] {"map_key1":"map_value3"} 20170329 > NULL text_value3 ["array_value4","array_value2","array_value3"] {"map_key3":"map_value3"," map_key1":""} 20170329 > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029)