hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] vinothchandar commented on issue #1328: Hudi upsert hangs
Date Thu, 13 Feb 2020 21:57:40 GMT
vinothchandar commented on issue #1328: Hudi upsert hangs
URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-585991857
 
 
   There must be something else going on.. just used my own benchmark jobs to generate a pattern
where the records are fully overwritten in a second (and a third) batch and it actually finishes
fine.. 
   
   ```
   hudi:hoodie_benchmark->connect --path file:///tmp/hudi-benchmark/output/org.apache.hudi
   35394 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
HoodieTableMetaClient from file:///tmp/hudi-benchmark/output/org.apache.hudi
   35415 [Spring Shell] INFO  org.apache.hudi.common.util.FSUtils  - Hadoop Configuration:
fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem:
[org.apache.hadoop.fs.LocalFileSystem@6851d345]
   35416 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table
properties from file:/tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/hoodie.properties
   35416 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished
Loading Table of type COPY_ON_WRITE(version=1) from file:///tmp/hudi-benchmark/output/org.apache.hudi
   Metadata for table hoodie_benchmark loaded
   hudi:hoodie_benchmark->commits show 
   36774 [Spring Shell] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline 
- Loaded instants [[20200213134159__clean__COMPLETED], [20200213134159__commit__COMPLETED],
[20200213134410__clean__COMPLETED], [20200213134410__commit__COMPLETED], [20200213134548__clean__COMPLETED],
[20200213134548__commit__COMPLETED]]
   ╔════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
   ║ CommitTime     │ Total Bytes Written │ Total Files Added │ Total Files Updated
│ Total Partitions Written │ Total Records Written │ Total Update Records Written │
Total Errors ║
   ╠════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
   ║ 20200213134548 │ 384.8 MB            │ 0                 │ 34               
  │ 3                        │ 4080024               │ 1211376                     
│ 0            ║
   ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213134410 │ 379.9 MB            │ 0                 │ 34               
  │ 3                        │ 4040016               │ 1199234                     
│ 0            ║
   ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20200213134159 │ 374.8 MB            │ 34                │ 0                
  │ 3                        │ 4000008               │ 0                           
│ 0            ║
   ╚════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
   
   hudi:hoodie_benchmark->
   ```
   
   and the times below in ms
   
   ```
    grep -n -e totalCreateTime -e totalUpsertTime  /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/*.commit

   /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134159.commit:697:  "totalCreateTime"
: 195060,
   /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134159.commit:698:  "totalUpsertTime"
: 0,
   /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134410.commit:697:  "totalCreateTime"
: 0,
   /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134410.commit:698:  "totalUpsertTime"
: 193693,
   /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134548.commit:697:  "totalCreateTime"
: 0,
   /tmp/hudi-benchmark/output/org.apache.hudi/.hoodie/20200213134548.commit:698:  "totalUpsertTime"
: 182277,
   ```
   
   
   Can we drill into your dataset?  are you generating tons of files due to granular partitionining?
can you share the spark UI and the hudi cli output like above?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message