hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Nichols <tmnich...@gmail.com>
Subject All Map jobs fail with NPE in LazyStruct.uncheckedGetField
Date Thu, 04 Mar 2010 20:34:55 GMT
I am trying out Hive, using Cloudera's EC2 distribution (Hadoop
0.18.3, Hive 0.4.1, I believe)

I'm trying to run the following query which causes every map task to
fail with an NPE before making any progress:

java.lang.NullPointerException
	at org.apache.hadoop.hive.serde2.lazy.LazyStruct.uncheckedGetField(LazyStruct.java:205)
	at org.apache.hadoop.hive.serde2.lazy.LazyStruct.getField(LazyStruct.java:182)
	at org.apache.hadoop.hive.serde2.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:141)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:53)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:74)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:49)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:332)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:175)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)


The query:
-- Get the node's max price and corresponding year/day/hour/month
select isone.node_id, isone.day, isone.hour, isone.lmp
from (select max(lmp) as mlmp, node_id
    from isone_lmp
    where isone_lmp.node_id = 400
    group by node_id) maxlmp
join isone_lmp isone on ( isone.node_id = maxlmp.node_id
  and isone.lmp=maxlmp.mlmp );

The table:
CREATE TABLE isone_lmp (
  node_id int,
  day string,
  hour int,
  minute int,
  energy float,
  congestion float,
  loss float,
  lmp float
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

The data looks like the following:
396,20090120,00,00,62.77,0,.78,63.55
397,20090120,00,00,62.77,0,.65,63.42
398,20090120,00,00,62.77,0,.65,63.42
399,20090120,00,00,62.77,0,.65,63.42
400,20090120,00,00,62.77,0,.65,63.42
401,20090120,00,00,62.77,0,-1.02,61.75
405,20090120,00,00,62.77,0,.21,62.98

It's about 15GB of data total; I can do a simple "select count(1) from
isone_lmp;" which executes as expected.  Any thoughts?  I've been able
to execute the same query on a smaller subset of data (2M rows as
opposed to 500M) on a non-distributed setup locally.

Thanks.
-Tom

Mime
View raw message