impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5448) Invalid number of files reported in Parquet scan node
Date Wed, 07 Jun 2017 06:52:18 GMT
Mostafa Mokhtar created IMPALA-5448:
---------------------------------------

             Summary: Invalid number of files reported in Parquet scan node 
                 Key: IMPALA-5448
                 URL: https://issues.apache.org/jira/browse/IMPALA-5448
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Mostafa Mokhtar
            Priority: Minor


It appears that the number of files reported in the HDFS scan node when reading Parquet data
is miscounted, for the scan node below the number of files should be the same as number of
RowGroups & Footers but the reported value is 219 which is 73 x NumColumns (3). 

{code}
  HDFS_SCAN_NODE (id=0):(Total: 13s749ms, non-child: 13s749ms, % non-child: 100.00%)
          Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:9/1.90
GB 3:12/2.65 GB 2:5/936.63 MB 6:9/1.74 GB 1:8/1.66 GB 5:10/1.83 GB 0:9/2.07 GB 4:11/2.40 GB

          ExecOption: PARQUET Codegen Enabled, Codegen enabled: 73 out of 73
          Runtime filters: Only following filters arrived: , waited 4s918ms
          Hdfs Read Thread Concurrency Bucket: 0:33.33% 1:48.48% 2:6.061% 3:12.12% 4:0% 5:0%
6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 
          File Formats: PARQUET/SNAPPY:219 
          BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 200.00 KB, 129.86 MB, 314.73
MB, 562.12 MB, 1.09 GB, 1.32 GB, 2.37 GB, 3.68 GB, 4.34 GB, 4.87 GB, 5.22 GB, 5.39 GB, 5.58
GB, 5.63 GB, 5.66 GB, 5.69 GB, 5.71 GB, 5.75 GB, 5.78 GB, 5.82 GB, 5.86 GB, 5.90 GB, 5.94
GB, 5.97 GB
           - FooterProcessingTime: (Avg: 711.035ms ; Min: 12.738ms ; Max: 1s958ms ; Number
of samples: 73)
           - AverageHdfsReadThreadConcurrency: 0.97 
           - AverageScannerThreadConcurrency: 17.70 
           - BytesRead: 6.01 GB (6452101777)
           - BytesReadDataNodeCache: 0
           - BytesReadLocal: 6.01 GB (6452101777)
           - BytesReadRemoteUnexpected: 0
           - BytesReadShortCircuit: 6.01 GB (6452101777)
           - DecompressionTime: 16s189ms
           - MaxCompressedTextFileLength: 0
           - NumColumns: 3 (3)
           - NumDisksAccessed: 8 (8)
           - NumRowGroups: 73 (73)
           - NumScannerThreadsStarted: 52 (52)
           - PeakMemoryUsage: 2.09 GB (2248246487)
           - PerReadThreadRawHdfsThroughput: 363.03 MB/sec
           - RemoteScanRanges: 0 (0)
           - RowBatchQueueGetWaitTime: 8s786ms
           - RowBatchQueuePutWaitTime: 3s079ms
           - RowsRead: 342.13M (342131176)
           - RowsReturned: 2.54M (2537896)
           - RowsReturnedRate: 184.58 K/sec
           - ScanRangesComplete: 73 (73)
           - ScannerThreadsInvoluntaryContextSwitches: 3.97K (3967)
           - ScannerThreadsTotalWallClockTime: 4m41s
             - MaterializeTupleTime(*): 13s302ms
             - ScannerThreadsSysTime: 3s043ms
             - ScannerThreadsUserTime: 26s263ms
           - ScannerThreadsVoluntaryContextSwitches: 23.15K (23148)
           - TotalRawHdfsReadTime(*): 16s949ms
           - TotalReadThroughput: 359.75 MB/sec
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message