hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-16341) Tez Task Execution Summary has incorrect input record counts on some operators
Date Fri, 31 Mar 2017 01:03:41 GMT
Jason Dere created HIVE-16341:
---------------------------------

             Summary: Tez Task Execution Summary has incorrect input record counts on some
operators
                 Key: HIVE-16341
                 URL: https://issues.apache.org/jira/browse/HIVE-16341
             Project: Hive
          Issue Type: Bug
          Components: Tez
            Reporter: Jason Dere
            Assignee: Jason Dere


{noformat}
Task Execution Summary
--------------------------------------------------------------------------------------------------------------------------------
  VERTICES  TOTAL_TASKS  FAILED_ATTEMPTS  KILLED_TASKS   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)
 INPUT_RECORDS  OUTPUT_RECORDS
--------------------------------------------------------------------------------------------------------------------------------
     Map 1          167                0             0       17640.00     2,109,200      
23,068    150,000,004      11,995,136
    Map 11            5                0             0       10559.00        71,960      
   633      4,023,690         799,900
    Map 13            1                0             0        2244.00         6,090      
    29             25               3
     Map 3            1                0             0        2849.00         7,080      
    99             25               3
     Map 5          271                0             0       55834.00    12,934,890      358,376
 1,500,000,001   1,500,000,161
     Map 7          241                0             0       91243.00     5,020,860      
71,182  1,827,250,341     652,413,443
Reducer 10            1                0             0        1010.00         1,900      
     0              4               0
Reducer 12            1                0             0        3854.00         1,320      
     0        799,900               1
Reducer 14            1                0             0        1420.00         3,790      
    45              3               1
 Reducer 2            1                0             0        9720.00         6,220      
   122     11,995,136               1
 Reducer 4            1                0             0         810.00         2,100      
   105              3               1
 Reducer 6            1                0             0       24863.00         3,260      
     5  1,500,000,161               1
 Reducer 8          412                0             0       88215.00    17,106,440      184,524
 2,165,208,640           1,864
 Reducer 9            2                0             0       29752.00         3,980      
     0          1,864               4
--------------------------------------------------------------------------------------------------------------------
{noformat}

Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS look incorrect for
the reducers that are responsible for aggregating the min/max/bloomfilter (Reducers 12, 14,
2, 6). For example Reducer 2 shows 12M input records. However looking at the task logs for
Reducer 2, there were only 167 input records.

It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer 8), but the total
output rows for Map 1 (rather than just the rows going to each specific vertex) is being counted
in the input rows for both Reducer 2 and Reducer 8.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message