hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Shen (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HUDI-657) It's lots of ClientUtils.createMetaClient when write to hudi
Date Thu, 05 Mar 2020 03:49:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hong Shen updated HUDI-657:
---------------------------
    Description: 
When I write data to hudi, I find it's lots of loading table properties log, and each take 
about 75ms, like below. 
{code:java}
20/03/05 09:20:32.379 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml],
FileSystem: [org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem@563f38c4]
20/03/05 09:20:32.431 INFO HoodieTableConfig: Loading table properties from oss://shenhong-test/mr_100_100_100/.hoodie/hoodie.properties
20/03/05 09:20:32.453 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1)
from oss://shenhong-test/mr_100_100_100

{code}
It's seems we have to create HoodieTableMetaClient when we use it.  but it's not need to
do that, eg, when we call getHoodieTable, it transfer metaClient, but the constructed function
in HoodieTable still createMetaClient .
{code:java}
public static <T extends HoodieRecordPayload> HoodieTable<T> getHoodieTable(HoodieTableMetaClient
metaClient,
 HoodieWriteConfig config, JavaSparkContext jsc) {
 switch (metaClient.getTableType()) {
 case COPY_ON_WRITE:
 return new HoodieCopyOnWriteTable<>(config, jsc);
 case MERGE_ON_READ:
 return new HoodieMergeOnReadTable<>(config, jsc);
 default:
 throw new HoodieException("Unsupported table type :" + metaClient.getTableType());
 }
}

{code}
{code:java}
  protected HoodieTable(HoodieWriteConfig config, JavaSparkContext jsc) {
    this.config = config;
    this.hadoopConfiguration = new SerializableConfiguration(jsc.hadoopConfiguration());
    this.viewManager = FileSystemViewManager.createViewManager(new SerializableConfiguration(jsc.hadoopConfiguration()),
        config.getViewStorageConfig());
    this.metaClient = ClientUtils.createMetaClient(jsc, config, true);
    this.index = HoodieIndex.createIndex(config, jsc);
  }

{code}
Can we optimize it.

  was:
When I write data to hudi, I find it's lots of loading table properties log, and each take 
about 75ms, like below. 

{code}
20/03/05 09:20:32.379 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml],
FileSystem: [org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem@563f38c4]
20/03/05 09:20:32.431 INFO HoodieTableConfig: Loading table properties from oss://shenhong-test/mr_100_100_100/.hoodie/hoodie.properties
20/03/05 09:20:32.453 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1)
from oss://shenhong-test/mr_100_100_100

{code}

It's seems we have to create HoodieTableMetaClient when we use it.  but it's not need to
do that, eg, when we call  getHoodieTable, it transfer metaClient, but the constructed function
in HoodieTable still createMetaClient .

{code}
public static <T extends HoodieRecordPayload> HoodieTable<T> getHoodieTable(HoodieTableMetaClient
metaClient,
 HoodieWriteConfig config, JavaSparkContext jsc) {
 switch (metaClient.getTableType()) {
 case COPY_ON_WRITE:
 return new HoodieCopyOnWriteTable<>(config, jsc);
 case MERGE_ON_READ:
 return new HoodieMergeOnReadTable<>(config, jsc);
 default:
 throw new HoodieException("Unsupported table type :" + metaClient.getTableType());
 }
}

{code}

{code}
  protected HoodieTable(HoodieWriteConfig config, JavaSparkContext jsc) {
    this.config = config;
    this.hadoopConfiguration = new SerializableConfiguration(jsc.hadoopConfiguration());
    this.viewManager = FileSystemViewManager.createViewManager(new SerializableConfiguration(jsc.hadoopConfiguration()),
        config.getViewStorageConfig());
    this.metaClient = ClientUtils.createMetaClient(jsc, config, true);
    this.index = HoodieIndex.createIndex(config, jsc);
  }

{code}

 I can optimize it if it need.



> It's lots of ClientUtils.createMetaClient when write to hudi
> ------------------------------------------------------------
>
>                 Key: HUDI-657
>                 URL: https://issues.apache.org/jira/browse/HUDI-657
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Hong Shen
>            Priority: Major
>
> When I write data to hudi, I find it's lots of loading table properties log, and each
take  about 75ms, like below. 
> {code:java}
> 20/03/05 09:20:32.379 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml],
FileSystem: [org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem@563f38c4]
> 20/03/05 09:20:32.431 INFO HoodieTableConfig: Loading table properties from oss://shenhong-test/mr_100_100_100/.hoodie/hoodie.properties
> 20/03/05 09:20:32.453 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1)
from oss://shenhong-test/mr_100_100_100
> {code}
> It's seems we have to create HoodieTableMetaClient when we use it.  but it's not need
to do that, eg, when we call getHoodieTable, it transfer metaClient, but the constructed function
in HoodieTable still createMetaClient .
> {code:java}
> public static <T extends HoodieRecordPayload> HoodieTable<T> getHoodieTable(HoodieTableMetaClient
metaClient,
>  HoodieWriteConfig config, JavaSparkContext jsc) {
>  switch (metaClient.getTableType()) {
>  case COPY_ON_WRITE:
>  return new HoodieCopyOnWriteTable<>(config, jsc);
>  case MERGE_ON_READ:
>  return new HoodieMergeOnReadTable<>(config, jsc);
>  default:
>  throw new HoodieException("Unsupported table type :" + metaClient.getTableType());
>  }
> }
> {code}
> {code:java}
>   protected HoodieTable(HoodieWriteConfig config, JavaSparkContext jsc) {
>     this.config = config;
>     this.hadoopConfiguration = new SerializableConfiguration(jsc.hadoopConfiguration());
>     this.viewManager = FileSystemViewManager.createViewManager(new SerializableConfiguration(jsc.hadoopConfiguration()),
>         config.getViewStorageConfig());
>     this.metaClient = ClientUtils.createMetaClient(jsc, config, true);
>     this.index = HoodieIndex.createIndex(config, jsc);
>   }
> {code}
> Can we optimize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message