hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Subdirectory question revisited
Date Tue, 02 Jun 2009 20:22:46 GMT
As per a previous list question 
(http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3Ce75c02ef0804011433x144813e6x2450da7883de3aca@mail.gmail.com%3E)

it looks as though it's not possible for hadoop to traverse input 
directories recursively in order to discover input files.

Just wondering a) if there's any particular reason why this 
functionality doesn't exist, and b) if not, if there's any 
workaround/hack to make it possible.

Like the OP, I was thinking it would be helpful to partition my input 
data by year, month, and day.  I figured his would enable me to run jobs 
against specific date ranges of input data, and thereby speed up the 
execution of my jobs since they wouldn't have to process every single 
record.

Any way to make this happen?  (Or am I totally going about this the 
wrong way for what I'm trying to achieve?)

TIA,

DR

Mime
View raw message