hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Nichols <tmnich...@gmail.com>
Subject Re: All Map jobs fail with NPE in LazyStruct.uncheckedGetField
Date Mon, 15 Mar 2010 15:42:38 GMT
Just a follow-up here -- when I upgraded to Hive 0.5 everything
worked...  Thanks again for the help.

On Fri, Mar 5, 2010 at 5:04 AM, Zheng Shao <zshao9@gmail.com> wrote:
> Do you want to try hive release 0.5.0 or hive trunk?
> We should have provided better error messages here:
> https://issues.apache.org/jira/browse/HIVE-1216
>
> Zheng
>
> On Thu, Mar 4, 2010 at 12:34 PM, Tom Nichols <tmnichols@gmail.com> wrote:
>> I am trying out Hive, using Cloudera's EC2 distribution (Hadoop
>> 0.18.3, Hive 0.4.1, I believe)
>>
>> I'm trying to run the following query which causes every map task to
>> fail with an NPE before making any progress:
>>
>> java.lang.NullPointerException
>>        at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:205)
>>        at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:182)
>>        at org.apache.hadoop.hive.serde2.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:141)
>>        at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:53)
>>        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:74)
>>        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332)
>>        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:49)
>>        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332)
>>        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:175)
>>        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>
>>
>> The query:
>> -- Get the node's max price and corresponding year/day/hour/month
>> select isone.node_id, isone.day, isone.hour, isone.lmp
>> from (select max(lmp) as mlmp, node_id
>>    from isone_lmp
>>    where isone_lmp.node_id = 400
>>    group by node_id) maxlmp
>> join isone_lmp isone on ( isone.node_id = maxlmp.node_id
>>  and isone.lmp=maxlmp.mlmp );
>>
>> The table:
>> CREATE TABLE isone_lmp (
>>  node_id int,
>>  day string,
>>  hour int,
>>  minute int,
>>  energy float,
>>  congestion float,
>>  loss float,
>>  lmp float
>> )
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY ','
>> STORED AS TEXTFILE;
>>
>> The data looks like the following:
>> 396,20090120,00,00,62.77,0,.78,63.55
>> 397,20090120,00,00,62.77,0,.65,63.42
>> 398,20090120,00,00,62.77,0,.65,63.42
>> 399,20090120,00,00,62.77,0,.65,63.42
>> 400,20090120,00,00,62.77,0,.65,63.42
>> 401,20090120,00,00,62.77,0,-1.02,61.75
>> 405,20090120,00,00,62.77,0,.21,62.98
>>
>> It's about 15GB of data total; I can do a simple "select count(1) from
>> isone_lmp;" which executes as expected.  Any thoughts?  I've been able
>> to execute the same query on a smaller subset of data (2M rows as
>> opposed to 500M) on a non-distributed setup locally.
>>
>> Thanks.
>> -Tom
>>
>
>
>
> --
> Yours,
> Zheng
>

Mime
View raw message