hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] bwu2 edited a comment on issue #1328: Hudi upsert hangs
Date Fri, 14 Feb 2020 02:51:25 GMT
bwu2 edited a comment on issue #1328: Hudi upsert hangs
URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-586071719
 
 
   Ok, thanks for this. 
   
   I have run the jobs again. First, insert 4m records, then upsert 3m of them, then upsert
4m, then upsert 4m. The two jobs upserting 3m records work fine and quickly, but the one where
upsert 4m takes >200 times as long. There is no partitioning and only one (small) output
file. 
   
   My results (from a synthetic dataset) are:
   ```bash
   hudi:json_data->commits show --limit 4
   ╔════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
   ║ CommitTime     │ Total Bytes Written │ Total Files Added │ Total Files Updated
│ Total Partitions Written │ Total Records Written │ Total Update Records Written │
Total Errors ║
   ╠════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
   ║ 20200214013937 │ 25.5 MB             │ 0                 │ 1                
  │ 1                        │ 4000000               │ 3000000                     
│ 0            ║
   ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213224532 │ 25.5 MB             │ 0                 │ 1                
  │ 1                        │ 4000000               │ 4000000                     
│ 0            ║
   ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213224325 │ 25.6 MB             │ 0                 │ 1                
  │ 1                        │ 4000000               │ 3000000                     
│ 0            ║
   ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213224218 │ 25.5 MB             │ 1                 │ 0                
  │ 1                        │ 4000000               │ 0                           
│ 0            ║
   ╚════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
   ```
   
   and the times:
   ```bash
   grep -n -e totalCreateTime -e totalUpsertTime  *.commit
   20200213224218.commit:36:  "totalCreateTime" : 30012,
   20200213224218.commit:37:  "totalUpsertTime" : 0,
   20200213224325.commit:36:  "totalCreateTime" : 0,
   20200213224325.commit:37:  "totalUpsertTime" : 46879,
   20200213224532.commit:36:  "totalCreateTime" : 0,
   20200213224532.commit:37:  "totalUpsertTime" : 10347280,
   20200214013937.commit:36:  "totalCreateTime" : 0,
   20200214013937.commit:37:  "totalUpsertTime" : 44598,
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message