hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balaji Varadarajan (Jira)" <j...@apache.org>
Subject [jira] [Created] (HUDI-637) Investigate slower hudi queries in S3 vs HDFS
Date Thu, 27 Feb 2020 01:59:00 GMT
Balaji Varadarajan created HUDI-637:
---------------------------------------

             Summary: Investigate slower hudi queries in S3 vs HDFS
                 Key: HUDI-637
                 URL: https://issues.apache.org/jira/browse/HUDI-637
             Project: Apache Hudi (incubating)
          Issue Type: Task
          Components: Performance
            Reporter: Balaji Varadarajan
             Fix For: 0.5.2


Hudi queries in S3 takes abnormally longer time compared to AWS. 

S3 listing itself is not taking that long of time. 

PERFORMANCE BUG:

the metadata list performance is likely causing performance issues with hudi.

 

{{scala> stopwatch(\{  sql("SELECT * FROM ap_invoices_all_compacted_s3").count})}}

{{Elapsed time: 1m 55.078473113s                                                  
res2: Long = xxxxxxxxxxxx}}

{{}}

{{scala> stopwatch(\{  sql("SELECT * FROM ap_invoices_all_compacted").count})  -- this
is the exact same table in hdfs}}

{{Elapsed time: 6.581217052s                                                      
res3: Long = xxxxxxxxxxx}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message