hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Mechanism when doing a select *
Date Mon, 21 Mar 2016 15:48:08 GMT
Well I use Spark as engine.

Now the question is have you updated statistics on ORC table?

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 21 March 2016 at 15:32, Tale Firefly <tale.hive@gmail.com> wrote:

> Re.
>
> Ty ty for your answer.
>
> I'm using Tez as execution engine for this query.
> And it launches a job to yarn.
>
> Do you know why it launches a job just for a select when I use Tez as
> execution engine ?
>
> BR.
>
> Tale
>
>
> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Hi,
>>
>> Your query is a table level query  that covers all rows in the table.
>>
>> Using ODBC you are connecting to Hive server 2 that runs on a given port.
>>
>> Depending on the version of Hive you are running Hive under the bonnet is
>> most likely using Map-Reduce as the execution engine.
>>
>> Data has to be collected from all blocks that hold data for this table.
>> The underlying ORC stats can only act at table level as there is no
>> predicate push down and data has to be sent to ODBC driver through the
>> network.
>>
>> The ODBC driver can only communicate with Hive server 2 so there is no
>> connectivity to individual nodes from your client.
>>
>> So in summary Hive server 2 collects data from all blocks and forwards it
>> to the client. The actual collection and filtering of result set in SQL
>> query will depend on many factors.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 21 March 2016 at 14:26, Tale Firefly <tale.hive@gmail.com> wrote:
>>
>>> Hello guys !
>>>
>>> I'm trying to understand the mechanism for a simple query select * from
>>> my_table when using HiveServer2.
>>>
>>> I'm using the hortonworks ODBC Driver for HiveServer2.
>>> I just do a select * from my_table.
>>> my_table is an ORC table based on files divised into blocks located on
>>> all my datanodes.
>>> I have 50 datanodes.
>>>
>>> My question is the following :
>>> Does all the data go from the datanodes to the node hosting the
>>> hiveserver2 before coming back to my client ?
>>> Or does all the data go directly from the datanodes to my client ?
>>>
>>> Hope you can help me o/
>>>
>>> Thank you
>>>
>>> Tale
>>>
>>
>>
>

Mime
View raw message