hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
Date Wed, 19 Dec 2012 02:36:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535588#comment-13535588
] 

Radim Kolar commented on MAPREDUCE-4887:
----------------------------------------

Very smooth distribution for pattern. If you were not defending people depending on undocumented
behavior, you would make it default. 

Dumping buckets distribution: min=902 avg=1043 max=1184
bucket 0 964 items, variance -0.07574304889741132
bucket 1 1042 items, variance -9.587727708533077E-4
bucket 2 1101 items, variance 0.05560882070949185
bucket 3 1039 items, variance -0.003835091083413231
bucket 4 1099 items, variance 0.053691275167785234
bucket 5 1044 items, variance 9.587727708533077E-4
bucket 6 998 items, variance -0.04314477468839885
bucket 7 1040 items, variance -0.0028763183125599234
bucket 8 1184 items, variance 0.13518696069031638
bucket 9 976 items, variance -0.06423777564717162
bucket 10 902 items, variance -0.13518696069031638
bucket 11 1124 items, variance 0.07766059443911794
bucket 12 931 items, variance -0.10738255033557047
bucket 13 1094 items, variance 0.0488974113135187
bucket 14 1152 items, variance 0.10450623202301054
bucket 15 977 items, variance -0.06327900287631831
bucket 16 1057 items, variance 0.013422818791946308
bucket 17 1048 items, variance 0.004793863854266539
bucket 18 1052 items, variance 0.00862895493767977
bucket 19 1042 items, variance -9.587727708533077E-4
bucket 20 1028 items, variance -0.014381591562799617
bucket 21 1038 items, variance -0.004793863854266539
bucket 22 1037 items, variance -0.005752636625119847
bucket 23 1040 items, variance -0.0028763183125599234
bucket 24 1084 items, variance 0.039309683604985615
bucket 25 974 items, variance -0.06615532118887824
bucket 26 954 items, variance -0.08533077660594439
bucket 27 1122 items, variance 0.07574304889741132
bucket 28 1009 items, variance -0.032598274209012464
bucket 29 1095 items, variance 0.04985618408437201
bucket 30 1109 items, variance 0.06327900287631831
bucket 31 978 items, variance -0.062320230105465
0 of 32 are too small or large buckets

                
> Rehashing partitioner for better distribution
> ---------------------------------------------
>
>                 Key: MAPREDUCE-4887
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Radim Kolar
>         Attachments: rehash1.txt, rehash2.txt, rehash3.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message