hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Re: Input paths
Date Mon, 16 Oct 2006 14:24:15 GMT
You could write your own InputFormat implementation that would check 
files instead of directories (perhaps passing in the parent directory of 
the files).  We just did something similar to this for reading index 
files as an InputFormat.


Vetle Roeim wrote:
> It seems that input to jobs is restricted to directories, and it is 
> impossible to add individual files -- JobConf calls 
> InputFormatBase.areValidInputDirectories, which checks that each input 
> path is a directory.
> Why is this required? Is it possible to change it or work around it 
> (without copying the files into a separate directory)?
> Thanks,
> --Vetle Roeim
> Opera Software ASA <URL: http://www.opera.com/ >

View raw message