hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailee Jain (JIRA)" <>
Subject [jira] [Created] (HIVE-16999) Performance bottleneck in the add_resource api
Date Fri, 30 Jun 2017 01:08:00 GMT
Sailee Jain created HIVE-16999:

             Summary: Performance bottleneck in the add_resource api
                 Key: HIVE-16999
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Sailee Jain
            Priority: Critical

Performance bottleneck is found in adding resource[lying on hdfs] to the distributed cache.

Commands used are :-
{{1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"}}
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by "converting to local"].

Ideally there is no need to bring the file to the local filesystem when this operation is
all about copying the file from one location on HDFS to other location on HDFS[distributed
This adds lot of performance bottleneck when the the resource is a big file and all commands
need the same resource.
After debugging around the impacted piece of code is found to be :-

{{public List<String> add_resources(ResourceType t, Collection<String> values,
boolean convertToUnix)
      throws RuntimeException {
    Set<String> resourceSet = resourceMaps.getResourceSet(t);
    Map<String, Set<String>> resourcePathMap = resourceMaps.getResourcePathMap(t);
    Map<String, Set<String>> reverseResourcePathMap = resourceMaps.getReverseResourcePathMap(t);
    List<String> localized = new ArrayList<String>();
    try {
      for (String value : values) {
        String key;
        {color:#d04437}//get the local path of downloaded jars.{color}
        List<URI> downloadedURLs = resolveAndDownload(t, value, convertToUnix);
{{  List<URI> {color:#d04437}resolveAndDownload{color}(ResourceType t, String value,
boolean convertToUnix) throws URISyntaxException,
      IOException {
    URI uri = createURI(value);
    if (getURLType(value).equals("file")) {
      return Arrays.asList(uri);
    } else if (getURLType(value).equals("ivy")) {
      return dependencyResolver.downloadDependencies(uri);
    } else {{color:#d04437} // goes here for HDFS{color}
      {color:#d04437}return Arrays.asList(createURI(downloadResource(value, convertToUnix)));{color}



This message was sent by Atlassian JIRA

View raw message