hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White" <tom.e.wh...@gmail.com>
Subject Input file globbing
Date Thu, 20 Mar 2008 16:43:17 GMT
I'm trying to use file globbing to select various input paths, like so:

conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));

But this gives an exception:

Exception in thread "main" java.io.IOException: Illegal file pattern:
Expecting set closure character or end of range, or } for glob {02 at
	at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1023)
	at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1008)
	at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:926)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:826)
	at org.apache.hadoop.fs.FileSystem.globPaths(FileSystem.java:873)
	at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:131)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:541)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:809)

Looking at the code for JobConf.getInputPaths I see it tokenizes using
a comma as the delimiter, producing two paths
"mr/input/glob/2008/02/{02" and "08}". This looks like a bug to me.
I'm surprised as this feature has been around for some time - are
folks not using it like this?


View raw message