hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Hunt <adamph...@gmail.com>
Subject Re: NPE when reading Parquet using Hive on Tez
Date Tue, 02 Feb 2016 21:02:24 GMT
HI Gopal,

With the release of 0.8.2, I thought I would give tez another shot.
Unfortunately, I got the same NPE. I dug a little deeper and it appears
that the configuration property "columns.types", which is used
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(),
is not being set. When I manually set that property in hive, your example
works fine.

hive> create temporary table x (x int) stored as parquet;
hive> insert into x values(1),(2);
hive> set columns.type=int;
hive> select count(*) from x where x.x > 1;
OK
1

I also saw that the configuration parameter parquet.columns.index.access is
also checked in that same function. Setting that property to "true" fixes
my issue.

hive> create temporary table x (x int) stored as parquet;
hive> insert into x values(1),(2);
hive> set parquet.column.index.access=true;
hive> select count(*) from x where x.x > 1;
OK
1

Thanks for your help.

Best,
Adam



On Tue, Jan 5, 2016 at 9:10 AM, Adam Hunt <adamphunt@gmail.com> wrote:

> Hi Gopal,
>
> Spark does offer dynamic allocation, but it doesn't always work as
> advertised. My experience with Tez has been more in line with my
> expectations. I'll bring up my issues with Spark on that list.
>
> I tried your example and got the same NPE. It might be a mapr-hive issue.
> Thanks for your help.
>
> Adam
>
> On Mon, Jan 4, 2016 at 12:58 PM, Gopal Vijayaraghavan <gopalv@apache.org>
> wrote:
>
>>
>> > select count(*) from alexa_parquet;
>>
>> > Caused by: java.lang.NullPointerException
>> >    at
>>
>> >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.tokeni
>> >ze(TypeInfoUtils.java:274)
>> >    at
>>
>> >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.<init>
>> >(TypeInfoUtils.java:293)
>> >    at
>>
>> >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeS
>> >tring(TypeInfoUtils.java:764)
>> >    at
>>
>> >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getColum
>> >nTypes(DataWritableReadSupport.java:76)
>> >    at
>>
>> >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(Dat
>> >aWritableReadSupport.java:220)
>> >    at
>>
>> >org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSp
>> >lit(ParquetRecordReaderWrapper.java:256)
>>
>> This might be an NPE triggered off by a specific case of the type parser.
>>
>> I tested it out on my current build with simple types and it looks like
>> the issue needs more detail on the column types for a repro.
>>
>> hive> create temporary table x (x int) stored as parquet;
>> hive> insert into x values(1),(2);
>> hive> select count(*) from x where x.x > 1;
>> Status: DAG finished successfully in 0.18 seconds
>> OK
>> 1
>> Time taken: 0.792 seconds, Fetched: 1 row(s)
>> hive>
>>
>> Do you have INT96 in the schema?
>>
>> > I'm currently evaluating Hive on Tez as an alternative to keeping the
>> >SparkSQL thrift sever running all the time locking up resources.
>>
>> Tez has a tunable value in tez.am.session.min.held-containers (i.e
>> something small like 10).
>>
>> And HiveServer2 can be made work similarly because spark
>> HiveThriftServer2.scala is a wrapper around hive's ThriftBinaryCLIService.
>>
>>
>>
>>
>>
>>
>> Cheers,
>> Gopal
>>
>>
>>
>

Mime
View raw message