hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-287) Speed up SequenceFile sort with memory reduction
Date Tue, 19 Feb 2008 18:48:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570364#action_12570364
] 

Chris Douglas commented on HADOOP-287:
--------------------------------------

The difference in time taken by the respective sort algorithms is not significant compared
to the overhead incurred in the sort phase, i.e. the time it requires to recreate the BlahSorter/BlahSort
objects, recreate and resize all the accounting arrays, and effect the indexed compares/swaps
via IntWritables. There is an additional copy for the index array during mergesort, but- again-
this is insignificant when considering the cost of the sort/spill. Note that the time taken
for each invocation of sort is proportional to the number of keys collected for each partition,
which itself is a small fraction of the time spent in that phase; the number of invocations
of sort is roughly proportional to the number of reducers, or more specifically: the distribution
of output keys effected from the partitioner from a given map within the io.sort.mb limit.
Given the current structure of the sort/spill, the particular sort algorithm is unlikely to
have a notable impact in the general case. The size of the dataset or the proportion of input
to output keys should have little bearing, since the merge code for spilled data is the same,
regardless of the sort algorithm selected.

> Speed up SequenceFile sort with memory reduction
> ------------------------------------------------
>
>                 Key: HADOOP-287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-287
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Benjamin Reed
>            Assignee: Chris Douglas
>         Attachments: 287-1.patch, 287-2.patch, 287-2.patch, s.patch, zoom-sort.patch,
zoom-sort.patch
>
>
> I replaced the merge sort with a quick sort and it yielded approx 30% improvement in
sort time. It also reduced the memory requirement for sorting because the sort is done in
place.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message