hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16341) Tez Task Execution Summary has incorrect input record counts on some operators
Date Thu, 06 Apr 2017 20:53:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959640#comment-15959640
] 

Jason Dere edited comment on HIVE-16341 at 4/6/17 8:52 PM:
-----------------------------------------------------------

Committed to master/branch-2/branch-2.3


was (Author: jdere):
Committed to master

> Tez Task Execution Summary has incorrect input record counts on some operators
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-16341
>                 URL: https://issues.apache.org/jira/browse/HIVE-16341
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>             Fix For: 2.3.0, 3.0.0
>
>         Attachments: HIVE-16341.1.patch, HIVE-16341.2.patch
>
>
> {noformat}
> Task Execution Summary
> --------------------------------------------------------------------------------------------------------------------------------
>   VERTICES  TOTAL_TASKS  FAILED_ATTEMPTS  KILLED_TASKS   DURATION(ms)  CPU_TIME(ms) 
GC_TIME(ms)  INPUT_RECORDS  OUTPUT_RECORDS
> --------------------------------------------------------------------------------------------------------------------------------
>      Map 1          167                0             0       17640.00     2,109,200 
     23,068    150,000,004      11,995,136
>     Map 11            5                0             0       10559.00        71,960 
        633      4,023,690         799,900
>     Map 13            1                0             0        2244.00         6,090 
         29             25               3
>      Map 3            1                0             0        2849.00         7,080 
         99             25               3
>      Map 5          271                0             0       55834.00    12,934,890 
    358,376  1,500,000,001   1,500,000,161
>      Map 7          241                0             0       91243.00     5,020,860 
     71,182  1,827,250,341     652,413,443
> Reducer 10            1                0             0        1010.00         1,900 
          0              4               0
> Reducer 12            1                0             0        3854.00         1,320 
          0        799,900               1
> Reducer 14            1                0             0        1420.00         3,790 
         45              3               1
>  Reducer 2            1                0             0        9720.00         6,220 
        122     11,995,136               1
>  Reducer 4            1                0             0         810.00         2,100 
        105              3               1
>  Reducer 6            1                0             0       24863.00         3,260 
          5  1,500,000,161               1
>  Reducer 8          412                0             0       88215.00    17,106,440 
    184,524  2,165,208,640           1,864
>  Reducer 9            2                0             0       29752.00         3,980 
          0          1,864               4
> --------------------------------------------------------------------------------------------------------------------
> {noformat}
> Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS look incorrect
for the reducers that are responsible for aggregating the min/max/bloomfilter (Reducers 12,
14, 2, 6). For example Reducer 2 shows 12M input records. However looking at the task logs
for Reducer 2, there were only 167 input records.
> It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer 8), but the
total output rows for Map 1 (rather than just the rows going to each specific vertex) is being
counted in the input rows for both Reducer 2 and Reducer 8.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message