hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: OPTIMIZING A HIVE QUERY
Date Tue, 14 Aug 2012 18:12:52 GMT
> My question was every join in a hive query would constitute to a
Mapreduce job.
In the general case, yes. BUT if one side of your join is small enough (ie
you can keep all in memory), a hash join/map join can be performed which is
much more performant (no reduce is required).

Bejoy KS has just provided the right link.

> Store data in the smarter way? can you please elaborate on this.
That's not Hive related. The same logic applies to RDMS. You want to keep a
normalized source of data but sometimes 'unnomarlizing' it can greatly
improves your performance. That's one of the advantage of document store.
It is very dependent on your use cases.

Bertrand

On Tue, Aug 14, 2012 at 7:30 PM, sudeep tokala <sudeeptokala@gmail.com>wrote:

> hi Bertrand,
>
> Thanks for the reply.
>
> My question was every join in a hive query would constitute to a Mapreduce
> job.
> Mapreduce job goes through serialization and deserilaization of objects
> Isnt it a overhead.
>
> Store data in the smarter way? can you please elaborate on this.
>
> Regards
> Sudeep
>
> On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux <dechouxb@gmail.com>wrote:
>
>> You may want to be clearer. Is your question : how can I change the
>> serialization strategy of Hive? (If so I let other users answer and I am
>> also interested in the answer.)
>>
>> Else the answer is simple. If you want to join data which can not be
>> stored into memory, you need to serialize them. The only solution is to
>> store the data in a smarter way which would not require you to do the join.
>> By the way, how do you know the serialisation is the bottleneck?
>>
>> Bertrand
>>
>>
>> On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala <sudeeptokala@gmail.com>wrote:
>>
>>>
>>>
>>> On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <sudeeptokala@gmail.com>wrote:
>>>
>>>> Hi all,
>>>>
>>>> How to avoid serialization and deserialization overhead in hive join
>>>> query ? will this optimize my query performance.
>>>>
>>>> Regards
>>>> sudeep
>>>>
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>


-- 
Bertrand Dechoux

Mime
View raw message