Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9896173F5 for ; Thu, 19 Feb 2015 21:14:30 +0000 (UTC) Received: (qmail 11009 invoked by uid 500); 19 Feb 2015 21:14:25 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 10946 invoked by uid 500); 19 Feb 2015 21:14:25 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 10936 invoked by uid 99); 19 Feb 2015 21:14:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 21:14:25 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 21:13:59 +0000 Received: by iecvy18 with SMTP id vy18so3229708iec.6 for ; Thu, 19 Feb 2015 13:13:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=wCLk8fTeae2kbcxq0NtjbFev0BwPg4q+CZsSu78iIfo=; b=B5rsl3ZmjAa90P+CNrn18yDC0/puddKAzXEzFSCcFL3Dumi6i1za57Z5cygUzjoV6G vznGiG3I/Pc25ZNqvEI3yDOmQEE1Jk4RbxciirRhgZ9vsA0ElQm4KZ4gtV3QzySTCOpd M9GHS72uwIjLXyXmFYzfRKcLIRTc5OnY2l2HjFOkfixT9I7iJ3nMQRkbqzcFhedFH3wL 4dSwWVthrqByzy+CyM7fIqwlxCPt1mnD9xRJv7vusrKYpCkLFTvDwwJtp4ptHuzBTK/r JmiD4+J/V1aEKv6sD6Zr9c1ks+9KHOthMyOj7TdYgQe9u+meAlzI1XC9X44bkme5tGIu /jRA== X-Received: by 10.107.33.11 with SMTP id h11mr8985313ioh.53.1424380438025; Thu, 19 Feb 2015 13:13:58 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.58.135 with HTTP; Thu, 19 Feb 2015 13:13:37 -0800 (PST) In-Reply-To: References: From: Yang Date: Thu, 19 Feb 2015 13:13:37 -0800 Message-ID: Subject: Re: select on parquet hive tables always gives NULL ? To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11403ed2089cb8050f776a65 X-Virus-Checked: Checked by ClamAV on apache.org --001a11403ed2089cb8050f776a65 Content-Type: text/plain; charset=UTF-8 ah... found out. my issue is that hive 0.13 doesn't handle this correctly. could be a bug. used 0.14, it works. btw the UNION[int, null] translates to parquet as a field "optional int32 myfieldName", I found this by calling ParquetFileReader.readFooter() On Thu, Feb 19, 2015 at 11:32 AM, Yang wrote: > I created a parquet file, expose that to hive using an external table, but > select from such tables are always giving NULL. > > > to show the symptom, I created the following data set , each record has > only 2 fields __PRIMARY_KEY__ and nullableInt. the schema represented in > avro is the following (I converted the data into parquet through the > avro-parquet convertor) > > {"type":"record","name":"mytest","namespace":"yy.com > ","doc":"","fields":[{"name":"__PRIMARY_KEY__","type":"string","doc":""},{"name":"nullableInt","type":["int","null"],"doc":""}],"version":"1424373511441"} > > > > the following is the parquet hive table def. I also attached the sample > parquet file. > > Thanks! > yang > > > drop table mytest; > CREATE EXTERNAL TABLE IF NOT EXISTS mytest > ( > PRIMARY_KEY String, > nullableInt int > ) > STORED AS PARQUET > LOCATION '/user/myusername/camus/topics/mytest/hourly/2015/02/19/11/' > ; > > select * from mytest limit 10; > > > > --001a11403ed2089cb8050f776a65 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
ah... found out. =C2=A0my issue is that hive 0.13 doesn= 9;t handle this correctly. could be a bug.

used 0.14, it= works.

btw the UNION[int, null] translates to par= quet as a field "optional int32 myfieldName", I found this by cal= ling ParquetFileReader.readFooter()


On Thu, Feb 19, 2015 at 11:32 = AM, Yang <teddyyyy123@gmail.com> wrote:
I created a parquet file, expose th= at to hive using an external table, but select from such tables are always = giving NULL.


to show the symptom, I= created the following data set , each record has only 2 fields =C2=A0__PRI= MARY_KEY__ and nullableInt. =C2=A0the schema represented in avro is the fol= lowing (I converted the data into parquet through the avro-parquet converto= r)

{"type":"record","name= ":"mytest","namespace":"yy.com","doc":"","= fields":[{"name":"__PRIMARY_KEY__","type"= ;:"string","doc":""},{"name":"= nullableInt","type":["int","null"],"= ;doc":""}],"version":"1424373511441"}



the following is the p= arquet hive table def.=C2=A0 I also attached the sample parquet file.
=

Thanks!
yang


drop table mytest;
CREATE EXTERNAL TABLE IF NOT EXISTS my= test
(
PRIMARY_KEY String,
nullableInt =C2=A0= =C2=A0 int
)
=C2=A0 STORED AS =C2=A0PARQUET
= LOCATION '/user/myusername/camus/topics/mytest/hourly/2015/02/19/11/= 9;
;

select * from mytest limit 10;




--001a11403ed2089cb8050f776a65--