hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-2284) BasicTypeSorterBase.compare calls progress on each compare
Date Mon, 26 Nov 2007 22:23:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545627
] 

owen.omalley edited comment on HADOOP-2284 at 11/26/07 2:21 PM:
-----------------------------------------------------------------

Another important note on this is that the ratio of "overhead" in the compare looks really
bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu seconds and
the work is being done in org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int)
is only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction levels
wrapped around the compare. Part of that overhead is the progress, but I suspect that we should
work on striping out more of the overhead.

      was (Author: owen.omalley):
    Another important note on this is that the ratio of "overhead" in the compare looks really
bad. In particular,
org.apache.hadoop.mapred.MergeSort.compare(Object,Object) is taking 2,503 cpu seconds and
the work is being done in org.apache.hadoop.io.Text$Comparator.compare(byte[],int,int,byte[],int,int)
is only 1158 seconds. Thus, it looks like there is 64% overhead in the abstraction levels
wrapped around the compare. Part of that overhead is the progress, but I suspect that should
strip out more of the overhead.
  
> BasicTypeSorterBase.compare calls progress on each compare
> ----------------------------------------------------------
>
>                 Key: HADOOP-2284
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2284
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.16.0
>
>
> The inner loop of the sort is calling progress on each compare. I think it would make
more sense to call progress in the sort rather than the compare or at most every 10000 compares.
In the performance numbers, the call to progress as part of the sort are consuming 12% of
the total cpu time when running word count under the local runner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message