hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Karlson <ekarl...@keywcorp.com>
Subject Possible bug in HiveInputSplit.getPath()
Date Wed, 17 Jul 2013 07:40:28 GMT
I've been developing a HiveStorageHandler class (and associated classes) to integrate a non-file-based
table storage engine into Hive.  I am currently working with version 1.3 of the HortonWorks
distro, but the issue that I've run into appears to be present in the Apache.Org code base
as well.

The specific issue that occurs is that when the MapReduce program is run, it dies with the
following exception:

java.lang.IllegalArgumentException: Can not create a Path from an empty string
	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
	at org.apache.hadoop.fs.Path.<init>(Path.java:90)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:106)
	at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

Looking at the code for HiveInputFormat.getPath() I find the following:

public Path getPath() {
  if (inputSplit instanceof FileSplit) {
     return ((FileSplit) inputSplit).getPath();
  }
  return new Path("");
}

It would appear that this code means that if my InputFormat.getSplits() method returns InputSplit
objects that do not derive from FileSplit (which is the case for my InputFormat class as my
storage engine is not file-based), the 'getPath()' method will try to return 'new Path("")'.

The problem is that the code for the Path class specifically disallows constructing an instance
of Path with an empty string.  Here is the code for Path.checkPathArg():

private void checkPathArg( String path ) {
  // disallow construction of a Path from an empty string
  if ( path == null) {
    throw new IllegalArgumentException( "Can not create a Path from a null string");
  }
  if ( path.length() == 0 ) {
  throw new IllegalArgumentException( "Can not create a Path from an empty string");
  }
}

So if HiveInputFormat.getPath() is ever called when 'inputSplit' is not an instance of 'FileSplit'
it invokes the construction of a Path object that will fail with an exception.

So my question is: If this is a bug in Hive, can we get it fixed?  If it is not a bug in Hive
but rather a misunderstanding on my part, could someone give me some pointers on how to use
InputSplit objects that do not derive from FileSplit in such a way as to avoid tripping this
issue?

Thank you for your time.

Eric Karlson

Mime
View raw message