hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanjia Gary Li (Jira)" <>
Subject [jira] [Created] (HUDI-494) [DEBUGGING] Huge amount of tasks when writing files into HDFS
Date Fri, 03 Jan 2020 05:08:00 GMT
Yanjia Gary Li created HUDI-494:

             Summary: [DEBUGGING] Huge amount of tasks when writing files into HDFS
                 Key: HUDI-494
             Project: Apache Hudi (incubating)
          Issue Type: Test
            Reporter: Yanjia Gary Li
            Assignee: Vinoth Chandar
         Attachments: Screen Shot 2020-01-02 at 8.53.24 PM.png, Screen Shot 2020-01-02 at
8.53.44 PM.png

I am using the manual build master after [] commit. 

I am seeing 3 million tasks when the Hudi Spark job writing the files into HDFS. 

I am seeing a huge amount of 0 byte files being written into .hoodie/.temp/ folder in my HDFS.
In the Spark UI, each task only writes less than 10 records in
count at HoodieSparkSqlWriter{code}
 All the stages before this seems normal. Any idea what happened here?


This message was sent by Atlassian Jira

View raw message