hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Grover <mgro...@oanda.com>
Subject Re: Hive JOIN fails if SELECT statement contains fields from the first table.
Date Mon, 16 Jan 2012 13:11:05 GMT
Hi Bing,
Something seems wrong about your create table statements.
You are using "LOAD DATA LOCAL INPATH" to load data into Hive tables. This makes me think
that the files /home/biadmin/hivetbl/student_details.txt and /home/biadmin/hivetbl/student_score.txt
are on the local drive.
In such a case you want to copy them onto HDFS for use by Hive tables and if so, doing "LOAD
DATA LOCAL INPATH" is the right thing to do.

However, when you create a table and specify the location 
...STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'...
This location refers to the HDFS location. If you don't specify this, Hive will choose this
by for you by default. However, if you do specify it, it's your responsibility to ensure that
this location doesn't conflict with another Hive table.

Long story short, a leaf level directory in HDFS stores the partitions/buckets and contents
of 1 table. In your create table statement, you were pointing both Hive tables to the same
HDFS location /home/biadmin/hivetbl.

If you could make your two tables, point to different HDFS locations like, '/usr/hive/warehouse/student_score'
and '/usr/hive/warehouse/student_details' and then do your load data inpath statements, that
should fix the problem.

Good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Bing Li" <smallputao@gmail.com>
To: dev@hive.apache.org, user@hive.apache.org
Sent: Monday, January 16, 2012 5:06:00 AM
Subject: Hive JOIN fails if SELECT statement contains fields from the first table.


1. I create two Hive table: 
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname STRING,age INT,gpa
FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl';



Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score FLOAT) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/home/biadmin/hivetbl'; 


2. Load data 
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt' OVERWRITE INTO
TABLE student_details; 


HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt' OVERWRITE INTO TABLE
student_score; 


3. Run inner join 
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname FROM student_details
a JOIN student_score b ON (a.studentid = b.studentid); 


Result: 
There are the following exception: 
cannot find field studentname from [0:studentid, 1:classid, 2:score] 


[My Question]: studentname is a field of the table student_details (The first table), why
search it in the table student_score(the second table)? 


log is like that; 
... ... 
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201201152221_0014_m_000000_3:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"studentid":106,"classid":null,"score":635.0} 
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) 
at org.apache.hadoop.mapred.Child.main(Child.java:170) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing
row {"studentid":106,"classid":null,"score":635.0} 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) 
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) 
... 4 more 
Caused by: java.lang.RuntimeException: cannot find field studentname from [0:studentid, 1:classid,
2:score] 
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345)

at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)

at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)

at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896) 
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)

at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) 
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) 
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) 
... 5 more 

Mime
View raw message