hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Kamrul Islam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
Date Tue, 17 Sep 2013 01:05:52 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769049#comment-13769049
] 

Mohammad Kamrul Islam commented on HIVE-4732:
---------------------------------------------

[~appodictic]: I can see your point. Indeed a very informative link.
As the link mentioned, the probability of ID collisions are very very rare. 
Pasted from wikipedia:
"To put these numbers into perspective, the annual risk of someone being hit by a meteorite
is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006
(6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year
and having one duplicate. In other words, only after generating 1 billion UUIDs every second
for the next 100 years, the probability of creating just one duplicate would be about 50%.
The probability of one duplicate would be about 50% if every person on earth owns 600 million
UUIDs."

With these probability, will it be necessary to make thing complex. Moreover, these IDs are
often few in one hive session.




 

 
                
> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> ---------------------------------------------------------------------
>
>                 Key: HIVE-4732
>                 URL: https://issues.apache.org/jira/browse/HIVE-4732
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Mark Wagner
>            Assignee: Mohammad Kamrul Islam
>         Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. Changing
to compare hashcodes (which can be computed once then reused) will improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message