hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: Mechanism when doing a select *
Date Mon, 21 Mar 2016 15:17:28 GMT

Your query is a table level query  that covers all rows in the table.

Using ODBC you are connecting to Hive server 2 that runs on a given port.

Depending on the version of Hive you are running Hive under the bonnet is
most likely using Map-Reduce as the execution engine.

Data has to be collected from all blocks that hold data for this table. The
underlying ORC stats can only act at table level as there is no predicate
push down and data has to be sent to ODBC driver through the network.

The ODBC driver can only communicate with Hive server 2 so there is no
connectivity to individual nodes from your client.

So in summary Hive server 2 collects data from all blocks and forwards it
to the client. The actual collection and filtering of result set in SQL
query will depend on many factors.


Dr Mich Talebzadeh

LinkedIn *

On 21 March 2016 at 14:26, Tale Firefly <> wrote:

> Hello guys !
> I'm trying to understand the mechanism for a simple query select * from
> my_table when using HiveServer2.
> I'm using the hortonworks ODBC Driver for HiveServer2.
> I just do a select * from my_table.
> my_table is an ORC table based on files divised into blocks located on all
> my datanodes.
> I have 50 datanodes.
> My question is the following :
> Does all the data go from the datanodes to the node hosting the
> hiveserver2 before coming back to my client ?
> Or does all the data go directly from the datanodes to my client ?
> Hope you can help me o/
> Thank you
> Tale

View raw message