hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <>
Subject [jira] Commented: (HIVE-1754) Remove JDBM component from Map Join
Date Wed, 10 Nov 2010 01:25:23 GMT


He Yongqiang commented on HIVE-1754:

1. code style:

A new file always needs a Apache license header.

And for example:

public class PathUtil {
  public static String suffix=".hashtable";
  public static String generatePath(String baseURI,Byte tag,String bigBucketFileName){
    String path = new String(baseURI+Path.SEPARATOR+"-"+tag+"-"+bigBucketFileName+suffix);
    return path;
  public static String generateFileName(Byte tag,String bigBucketFileName){
    String fileName = new String("-"+tag+"-"+bigBucketFileName+suffix);
    return fileName;

  public static String generateTmpURI(String baseURI,String id){
    String tmpFileURI = new String(baseURI+Path.SEPARATOR+"HashTable-"+id);
    return tmpFileURI;

Should be formated to :

 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * See the License for the specific language governing permissions and
 * limitations under the License.

package org.apache.hadoop.hive.ql.util;

import org.apache.hadoop.fs.Path;

public class PathUtil {

  public static String suffix = ".hashtable";

  public static String generatePath(String baseURI, Byte tag,
      String bigBucketFileName) {
    String path = new String(baseURI + Path.SEPARATOR + "-" + tag + "-"
        + bigBucketFileName + suffix);
    return path;

  public static String generateFileName(Byte tag, String bigBucketFileName) {
    String fileName = new String("-" + tag + "-" + bigBucketFileName + suffix);
    return fileName;

  public static String generateTmpURI(String baseURI, String id) {
    String tmpFileURI = new String(baseURI + Path.SEPARATOR + "HashTable-" + id);
    return tmpFileURI;

Let's put and into a HiveUtil class, like Utilities in exec (or
create a new one and put in exec package or common package). 


-    //Qualify the path against the filesystem. The user configured path might contain default
port which is skipped
-    //in the file status. This makes sure that all paths which goes into PathToPartitionInfo
are always listed status
-    //filepath.
-    newPath = fs.makeQualified(newPath);

why these code are removed? They should be there.

revert the changes in ExecMapper. keep it clean. 

code style in HashTableDummyOperator. add a default serialize id. do not use 2 blank lines
inside a method. keep at least one blank line between 2 method definitons.

remove some never read vars from HashTableSinkOperator.
  protected transient
  Map<Byte, List<ObjectInspector>> rowContainerStandardObjectInspectors;
should be in one line.

generateMapMetaData(); can be put into init(). MapJoinRowContainer res = null; should be parameterized.
int bucketSize = HiveConf.getIntVar(hconf, HiveConf.ConfVars.HIVEMAPJOINBUCKETCACHESIZE);
should be put into init().
bucketSize can be a class field.
res.add(value); is duplicate in if () {} else {}. Put it after the if else.

In close(), if the abort is true, do we need to do the dump?

          String bigBucketFileName = this.getExecContext().getCurrentBigBucketFile();
          if(bigBucketFileName == null ||bigBucketFileName.length()==0) {

I guess if we run it locally, the bigBucketFileName is always null. Is that true. If yes,
how does this patch handle the bucket map join?

revert changes of MapRedTask

AbstractRowContainer/MapJoinDoubleKeys/MapJoinRowContainer/MapJoinSingleKey misses the apache

Please make sure cleaning up the code.

> Remove JDBM component from Map Join
> -----------------------------------
>                 Key: HIVE-1754
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>         Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, hive-1754_4.patch,
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most of execution
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the map join

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message