hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Kumar <>
Subject Understanding hive query plan for Join operation
Date Sat, 17 Sep 2016 11:18:10 GMT

I have the a query and its associated query and query
<> for
simulated data

The number of rows in the table lte_data_tenmillion is 10000000
The number of rows in the table subscriber data is 100000

*For both tables none of the rows have a null value in the subscriber_id
column. *

I'm finding it difficult to understand, why the query plan displays the
number of rows scanned (after applying predicate: subscriber_id is not null
(type: boolean)) to be exactly half the value of original number of rows.

Similar is the case with the other filer operator.

Also, the total number of rows of the resulting data, as mentioned under
"File Output Operator [FS_20]" is 5500000. However the actual number of
rows in the resulting table is 2499723.

I might be wrongly interpreting the query plan . I would highly appreciate
it if someone could clear the inconsistencies I observe in the query plan
and the actual result.

Thanks and regards,
Nitin Kumar

View raw message