hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sachin Bochare (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-678) Add support for building index table
Date Sun, 16 May 2010 15:06:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867996#action_12867996
] 

Sachin Bochare commented on HIVE-678:
-------------------------------------

I was doing some performance analysis of indexing and was using this indexing patch. I found
three issues in the patch and fixed those temporary to move ahead with my experiments.

The three issues are:
# There was a regression introduced in selecting data from table. Simple select queries were
returning 0 rows. The problem was that temporary out path used by Hive were not properly set.
# The index records were off by one. The offset of next index value was shown in current index
value.
# The file name and first offset value was not separated with a delimiter. 

I will send out my experiment results in a day or two to you. 

Following is the patch to fix those three issues. This needs to be appied after applying the
attached patch(hive-678-2009-07-25.patch).

{code}
diff -ur Hive-796926-patch-HIVE-678/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
Hive-796926-patch-HIVE-678-Modified/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
--- Hive-796926-patch-HIVE-678/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
      2010-05-16 20:05:58.000000000 +0530
+++ Hive-796926-patch-HIVE-678-Modified/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
     2010-05-16 20:10:56.000000000 +0530
@@ -348,8 +348,12 @@
     }

     String hiveScratchDir = getScratchDir();
-    Path jobScratchDir = new Path(hiveScratchDir);
-    Path outPath = new Path(getOutputPath());
+    Path jobScratchDir = new Path(hiveScratchDir + Utilities.randGen.nextInt());
+    Path outPath = jobScratchDir;
+    if (outputPath != null) {
+        outPath = new Path(outputPath);
+    }
+
     if (isDelOutputIfExists()) {
       try {
         FileSystem outFs = outPath.getFileSystem(job);
diff -ur Hive-796926-patch-HIVE-678/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexRecordReader.java
Hive-796926-patch-HIVE-678-Modified/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexRecordReader.java
--- Hive-796926-patch-HIVE-678/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexRecordReader.java
  2010-05-16 20:05:58.000000000 +0530
+++ Hive-796926-patch-HIVE-678-Modified/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexRecordReader.java
 2010-05-16 20:11:38.000000000 +0530
@@ -64,7 +64,6 @@
    * to upper.
    */
   public boolean next(Object key, Object value) throws IOException {
-    boolean result = rawReader.next(rawKey, value);
     if (!blockCompressed) {
       try {
         ((LongWritable) key).set(rawReader.getPos());
@@ -74,6 +73,7 @@
     } else {
       ((LongWritable) key).set(blockStart);
     }
+    boolean result = rawReader.next(rawKey, value);
     return result;
   }

diff -ur Hive-796926-patch-HIVE-678/ql/src/java/org/apache/hadoop/hive/ql/index/IndexBuilderCompactSumReducer.java
Hive-796926-patch-HIVE-678-Modified/ql/src/java/org/apache/hadoop/hive/ql/index/IndexBuilderCompactSumReducer.java
--- Hive-796926-patch-HIVE-678/ql/src/java/org/apache/hadoop/hive/ql/index/IndexBuilderCompactSumReducer.java
  2010-05-16 20:05:58.000000000 +0530
+++ Hive-796926-patch-HIVE-678-Modified/ql/src/java/org/apache/hadoop/hive/ql/index/IndexBuilderCompactSumReducer.java
 2010-05-16 20:12:36.000000000 +0530
@@ -95,6 +95,7 @@
     SortedSet<Long> poses = bucketOffsetMap.get(bucketName);
     Iterator<Long> posIter = poses.iterator();
     if (posIter.hasNext()) {
+      bl.append(HiveIndex.BUCKET_POS_VAL_SEPARATOR);
       bl.append(posIter.next());
     }
     while (posIter.hasNext()) {
{code}

Thanks,
Sachin


> Add support for building index table
> ------------------------------------
>
>                 Key: HIVE-678
>                 URL: https://issues.apache.org/jira/browse/HIVE-678
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: hive-678-2009-07-25.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message