hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarfraz Ramay <sarfraz.ra...@gmail.com>
Subject Re: Hive Vs Pig: Master's thesis
Date Sat, 03 May 2014 17:12:30 GMT
Thanks for the suggestion. Can you please explain a little on "focusing on
the design, the implementation with third party tools", do you mean
comparing them ? And by script you mean scripts of UDFs, SerDes and Loaders
?




Regards,
Sarfraz Rasheed Ramay (DIT)
Dublin, Ireland.


On Sat, May 3, 2014 at 4:23 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> IMHP Comparing the "performance" is boring and has been done umpteen times
> before. The world won't get much out of another performance benchmark,
> other then a bunch of fan boys saying "Look ours is faster hahahahah" and
> then the other side says "but in this case ours is faster and that is the
> more important case" Benchmarks are easy to bias and manipulate, and
> comparing two like but not exact systems is hard. For example you will see
> impala "winning" benchmarks HPC by re-writing queries, and then someone in
> tez re-writes it another way tunes a setting and then they are "winning"
> the benchmark.
>
> You would be better off focusing on the design, the implementation with
> third party tools (udfs, serdes, loaders) , the nuances of a more
> procedural language then a declarative. Look in the world for scripts and
> see who is deploying them effectively.
>
>
>
>
>
> On Sat, May 3, 2014 at 4:46 AM, Sarfraz Ramay <sarfraz.ramay@gmail.com>wrote:
>
>> Thanks Thejas for your input! These are interesting and very specific
>> which is exactly what is required for a masters thesis.
>>
>> Are there any publications on Hive and the evaluation of its performance
>> that i can use to compare ?
>>
>> Regards,
>> Sarfraz Rasheed Ramay (DIT)
>> Dublin, Ireland.
>>
>>
>> On Sat, May 3, 2014 at 3:07 AM, Thejas Nair <thejas@hortonworks.com>wrote:
>>
>>> The primary difference between hive and pig is the language. There are
>>> implementation differences that will result in performance
>>> differences, but it will be hard to figure out what aspect of
>>> implementation responsible for what improvement.
>>>
>>> I think a more interesting project would be to compare the impact of
>>> various performance improvements in hive. There are many features that
>>> you can turn on and off.
>>>
>>> example -
>>> - hive vectorization
>>> - file format - text vs RCFile vs ORC
>>> - compressed vs uncompressed
>>> - mapreduce vs tez execution engine
>>> - stats optimized queries
>>>
>>>
>>>
>>> On Thu, May 1, 2014 at 5:47 AM, Sarfraz Ramay <sarfraz.ramay@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> It seems that both Hive and Pig are used for managing large data sets.
>>> >> Hive is more SQL oriented whereas Pig is more for the data flows. I
>>> am doing
>>> >> a master's thesis on the performance evaluation of both. Can some
>>> please
>>> >> provide a list of tasks that would make for an interesting comparison
>>> ?
>>> >>
>>> >>
>>> >> What is Hive good at ?
>>> >>
>>> >> What is Pig good at ?
>>> >>
>>> >> Ideally, i would like to take what Hive is good at and test it in Pig
>>> and
>>> >> vice versa. The competitive characteristics  would make for an
>>> interesting
>>> >> comparison.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Regards,
>>> >> Sarfraz Rasheed Ramay (DIT)
>>> >> Dublin, Ireland.
>>> >
>>> >
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>

Mime
View raw message