hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Subdirectory question revisited
Date Tue, 02 Jun 2009 20:22:46 GMT
As per a previous list question 

it looks as though it's not possible for hadoop to traverse input 
directories recursively in order to discover input files.

Just wondering a) if there's any particular reason why this 
functionality doesn't exist, and b) if not, if there's any 
workaround/hack to make it possible.

Like the OP, I was thinking it would be helpful to partition my input 
data by year, month, and day.  I figured his would enable me to run jobs 
against specific date ranges of input data, and thereby speed up the 
execution of my jobs since they wouldn't have to process every single 

Any way to make this happen?  (Or am I totally going about this the 
wrong way for what I'm trying to achieve?)



View raw message