hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balaji Varadarajan (Jira)" <>
Subject [jira] [Created] (HUDI-637) Investigate slower hudi queries in S3 vs HDFS
Date Thu, 27 Feb 2020 01:59:00 GMT
Balaji Varadarajan created HUDI-637:

             Summary: Investigate slower hudi queries in S3 vs HDFS
                 Key: HUDI-637
             Project: Apache Hudi (incubating)
          Issue Type: Task
          Components: Performance
            Reporter: Balaji Varadarajan
             Fix For: 0.5.2

Hudi queries in S3 takes abnormally longer time compared to AWS. 

S3 listing itself is not taking that long of time. 


the metadata list performance is likely causing performance issues with hudi.


{{scala> stopwatch(\{  sql("SELECT * FROM ap_invoices_all_compacted_s3").count})}}

{{Elapsed time: 1m 55.078473113s                                                  
res2: Long = xxxxxxxxxxxx}}


{{scala> stopwatch(\{  sql("SELECT * FROM ap_invoices_all_compacted").count})  -- this
is the exact same table in hdfs}}

{{Elapsed time: 6.581217052s                                                      
res3: Long = xxxxxxxxxxx}}

This message was sent by Atlassian Jira

View raw message