impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5429) Use a thread pool to load block metadata in parallel
Date Sat, 03 Jun 2017 00:03:04 GMT
Mostafa Mokhtar created IMPALA-5429:
---------------------------------------

             Summary: Use a thread pool to load block metadata in parallel 
                 Key: IMPALA-5429
                 URL: https://issues.apache.org/jira/browse/IMPALA-5429
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 2.9.0
            Reporter: Mostafa Mokhtar
            Assignee: Marcel Kornacker
         Attachments: flight_recording_s3bench1vpcclouderacom4711_2.jfr

Metadata loading for tables with lots of partitions can be fairly slow special on S3 and ADLS,
the operation is fairly latency driven so multiple threads should help speedup the process.

Listing files from multiple partitions in parallel should provide well speedup specially for
S3 and ADLS where latencies are usually higher than HDFS. 

HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) might be a good starting
point. 

|Stack-Trace||Count||Percentage(%)||Total|
|com.amazonaws.services.s3.AmazonS3Client.listObjects(ListObjectsRequest)|4,340|75.649|83,489,694,712|
|---org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(ListObjectsRequest)|4,340|75.649|83,489,694,712|
|------org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(Path,-String,-Set)|3,256|56.754|63,540,096,016|
|---------org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(Path,-boolean)|3,256|56.754|63,540,096,016|
|------------org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(Path)|3,256|56.754|63,540,096,016|
|---------------org.apache.hadoop.fs.FileSystem.exists(Path)|2,178|37.964|45,375,122,798|
|------------------org.apache.hadoop.fs.s3a.S3AFileSystem.exists(Path)|2,178|37.964|45,375,122,798|
|---------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition)|1,082|18.86|23,383,160,065|
|------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List)|1,082|18.86|23,383,160,065|
|---------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean)|1,082|18.86|23,383,160,065|
|------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table,-boolean,-boolean,-Set)|1,082|18.86|23,383,160,065|
|---------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table)|1,082|18.86|23,383,160,065|
|---------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition)|1,096|19.104|21,991,962,733|
|------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition)|1,096|19.104|21,991,962,733|
|---------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List)|1,096|19.104|21,991,962,733|
|------------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean)|1,096|19.104|21,991,962,733|
|--------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor)|1,078|18.79|18,164,973,218|
|------------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean)|1,078|18.79|18,164,973,218|
|---------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap)|1,078|18.79|18,164,973,218|
|------------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition)|1,078|18.79|18,164,973,218|
|---------------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition)|1,078|18.79|18,164,973,218|
|------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition)|1,078|18.79|18,164,973,218|
|---------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List)|1,078|18.79|18,164,973,218|
|------org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.<init>(Listing,-Path,-ListObjectsRequest)|1,084|18.895|19,949,598,696|
|---------org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Path,-ListObjectsRequest,-PathFilter,-Listing$FileStatusAcceptor,-RemoteIterator)|1,084|18.895|19,949,598,696|
|------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor)|1,084|18.895|19,949,598,696|
|---------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean)|1,084|18.895|19,949,598,696|
|------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap)|1,084|18.895|19,949,598,696|
|--------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition)|1,084|18.895|19,949,598,696|




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message