hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <>
Subject Hive nested query result storage
Date Tue, 30 Sep 2014 20:53:44 GMT
Hi All,

Please help me understand where and how does hive store temporary result of a nested query.

I have written a UDF which reads the data from a table t1 in a nested query.
Table t1 should be in ascending order and I have to make sure that t1 data should be processed
by a single mapper. The reason for single mapper is that my UDF contains some global variables
which gets initialized per mapper and if t1 is processed by multiple mappers then output would
result wrong.


select gsid,contract,max_date,min_date,contract_rangeId(gsid,contract,max_date,min_date) as
range_id from (select gsid,contract,max_date,min_date from tmp_rcc_normwk_gs0_test3 order
by gsid,contract,max_date,min_date) t1.

Since the nested query select gsid,contract,max_date,min_date from tmp_rcc_normwk_gs0_test3
order by gsid,contract,max_date,min_date runs only one reducer, will the outer query runs
with only 1 mapper?
If yes, where does the output of nested query stored? HDFS or local file system?
Love to get some help on this.

Vikash Talanki
Engineer - Software
Phone: +1 (408)838 4078

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA - 95134
United States<>

[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended
recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If
you are not the intended recipient (or authorized to receive for the recipient), please contact
the sender by reply email and delete all copies of this message.
For corporate legal information go to:

View raw message