hive-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1301310 - /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/
Date Fri, 16 Mar 2012 01:50:18 GMT
Author: cws
Date: Fri Mar 16 01:50:17 2012
New Revision: 1301310

HIVE-2778 [jira] Fail on table sampling
(Navis Ryu via Carl Steinbach)

HIVE-2778 fix NPE on table sampling

Trying table sampling on any non-empty table throws NPE. This does not occur by
test on mini-MR.  <div class="preformatted panel" style="border-width:
1px;"><div class="preformattedContent panelContent"> <pre>select count(*) from
emp tablesample (0.1 percent);      Total MapReduce jobs = 1 Launching Job 1 out
of 1 Number of reduce tasks determined at compile time: 1 In order to change the
average load for a reducer (in bytes):   set
hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum
number of reducers:   set hive.exec.reducers.max=<number> In order to set a
constant number of reducers:   set mapred.reduce.tasks=<number>
java.lang.NullPointerException 	at
	at org.apache.hadoop.mapred.JobClient.writeOldSplits( 	at
org.apache.hadoop.mapred.JobClient.writeSplits( 	at
org.apache.hadoop.mapred.JobClient.access$500( 	at
org.apache.hadoop.mapred.JobClient$ 	at
org.apache.hadoop.mapred.JobClient$ 	at Method) 	at 	at
	at org.apache.hadoop.mapred.JobClient.submitJobInternal( 	at
org.apache.hadoop.mapred.JobClient.submitJob( 	at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute( 	at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute( 	at
org.apache.hadoop.hive.ql.exec.Task.executeTask( 	at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( 	at
org.apache.hadoop.hive.ql.Driver.launchTask( 	at
org.apache.hadoop.hive.ql.Driver.execute( 	at 	at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd( 	at
org.apache.hadoop.hive.cli.CliDriver.processCmd( 	at
org.apache.hadoop.hive.cli.CliDriver.processLine( 	at 	at
org.apache.hadoop.hive.cli.CliDriver.main( 	at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 	at
	at java.lang.reflect.Method.invoke( 	at
org.apache.hadoop.util.RunJar.main( Job Submission failed with
exception 'java.lang.NullPointerException(null)' FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask  </pre> </div></div>

Test Plan: EMPTY

Reviewers: JIRA, cwsteinbach

Reviewed By: cwsteinbach

Differential Revision:


Modified: hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/
--- hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/ (original)
+++ hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/ Fri Mar
16 01:50:17 2012
@@ -435,14 +435,16 @@ public class CombineHiveInputFormat<K ex
     List<InputSplitShim> retLists = new ArrayList<InputSplitShim>();
     Map<String, ArrayList<InputSplitShim>> aliasToSplitList = new HashMap<String,
     Map<String, ArrayList<String>> pathToAliases = mrwork.getPathToAliases();
+    Map<String, ArrayList<String>> pathToAliasesNoScheme = removeScheme(pathToAliases);
     // Populate list of exclusive splits for every sampled alias
     for (InputSplitShim split : splits) {
       String alias = null;
       for (Path path : split.getPaths()) {
+        boolean schemeless = path.toUri().getScheme() == null;
         List<String> l = HiveFileFormatUtils.doGetAliasesFromPath(
-            pathToAliases, path);
+            schemeless ? pathToAliasesNoScheme : pathToAliases, path);
         // a path for a split unqualified the split from being sampled if:
         // 1. it serves more than one alias
         // 2. the alias it serves is not sampled
@@ -500,6 +502,15 @@ public class CombineHiveInputFormat<K ex
     return retLists;
+  Map<String, ArrayList<String>> removeScheme(Map<String, ArrayList<String>>
pathToAliases) {
+    Map<String, ArrayList<String>> result = new HashMap<String, ArrayList<String>>();
+    for (Map.Entry <String, ArrayList<String>> entry : pathToAliases.entrySet())
+      String newKey = new Path(entry.getKey()).toUri().getPath();
+      result.put(newKey, entry.getValue());
+    }
+    return result;
+  }
    * Create a generic Hive RecordReader than can iterate over all chunks in a
    * CombinedFileSplit.

View raw message