hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <>
Subject [jira] [Created] (HIVE-16980) The partition of join is not divided evently in HOS
Date Wed, 28 Jun 2017 03:39:00 GMT
liyunzhang_intel created HIVE-16980:

             Summary: The partition of join is not divided evently in HOS
                 Key: HIVE-16980
             Project: Hive
          Issue Type: Bug
            Reporter: liyunzhang_intel

In HoS,the join implementation is union+repartition sort. We use HashPartitioner to partition
the result of union.
    public JavaPairRDD<HiveKey, BytesWritable> shuffle(
      JavaPairRDD<HiveKey, BytesWritable> input, int numPartitions) {
    JavaPairRDD<HiveKey, BytesWritable> rdd;
    if (totalOrder) {
      if (numPartitions > 0) {
        if (numPartitions > 1 && input.getStorageLevel() == StorageLevel.NONE())
        rdd = input.sortByKey(true, numPartitions);
      } else {
        rdd = input.sortByKey(true);
    } else {
      Partitioner partitioner = new HashPartitioner(numPartitions);
      rdd = input.repartitionAndSortWithinPartitions(partitioner);
    return rdd;
In spark history server, i saw that there are 28 tasks in the repartition sort period while
21 tasks are finished less than 1s and the remaining 7 tasks spend long time to execute. Is
there any way to make the data evenly assigned to every partition?

This message was sent by Atlassian JIRA

View raw message