Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB1F318D23 for ; Tue, 30 Jun 2015 04:49:04 +0000 (UTC) Received: (qmail 43777 invoked by uid 500); 30 Jun 2015 04:49:04 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 43755 invoked by uid 500); 30 Jun 2015 04:49:04 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 43746 invoked by uid 99); 30 Jun 2015 04:49:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 04:49:04 +0000 Date: Tue, 30 Jun 2015 04:49:04 +0000 (UTC) From: "Xuefu Zhang (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-11095?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1460= 7197#comment-14607197 ]=20 Xuefu Zhang commented on HIVE-11095: ------------------------------------ +1 > SerDeUtils another bug ,when Text is reused > -------------------------------------------- > > Key: HIVE-11095 > URL: https://issues.apache.org/jira/browse/HIVE-11095 > Project: Hive > Issue Type: Bug > Components: API, CLI > Affects Versions: 0.14.0, 1.0.0, 1.2.0 > Environment: Hadoop 2.3.0-cdh5.0.0 > Hive 0.14 > Reporter: xiaowei wang > Assignee: xiaowei wang > Fix For: 2.0.0 > > Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE= -11095.3.patch.txt > > > {noformat} > The method transformTextFromUTF8 have a error bug, It invoke a bad metho= d of Text,getBytes()! > The method getBytes of Text returns the raw bytes; however, only data up = to Text.length is valid.A better way is use copyBytes() if you need the r= eturned array to be precisely the length of the data. > But the copyBytes is added behind hadoop1.=20 > {noformat} > How I found this bug=EF=BC=9F > When i query data from a lzo table =EF=BC=8C I found in results =EF=BC=9A= the length of the current row is always largr than the previous row=EF=BC= =8C and sometimes=EF=BC=8Cthe current row contains the contents of the prev= ious row=E3=80=82 For example =EF=BC=8Ci execute a sql , > {code:sql} > select * from web_searchhub where logdate=3D2015061003 > {code} > the result of sql see blow.Notice that ,the second row content contains t= he first row content. > {noformat} > INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=3D/10.13.193.= 68:42098,session=3D3151,thread=3D254 2015061003 > INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> session=3D901,= thread=3D223ession=3D3151,thread=3D254 2015061003 > {noformat} > The content of origin lzo file content see below ,just 2 rows. > {noformat} > INFO [03:00:05.635] session= =3D3148,thread=3D285 > INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=3D/10.13.193.= 68:42095,session=3D3148,thread=3D285 > {noformat} > I think this error is caused by the Text reuse,and I found the solutions = . > Addicational, table create sql is :=20 > {code:sql} > CREATE EXTERNAL TABLE `web_searchhub`( > `line` string) > PARTITIONED BY ( > `logdate` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ' > U0000' > WITH SERDEPROPERTIES ( > 'serialization.encoding'=3D'GBK') > STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" > OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"= ; > LOCATION > 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub'=20 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)