hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi Ozawa <oz...@apache.org>
Subject Re: TimSort bug and its workaround
Date Thu, 26 Feb 2015 08:16:29 GMT
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.

Thanks,
- Tsuyoshi

On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <ozawa@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>  is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi

Mime
View raw message