Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <4A258A16.8050300@darose.net>
References: <4A258A16.8050300@darose.net>
From: Aaron Kimball <aaron@cloudera.com>
Date: Tue, 2 Jun 2009 16:22:21 -0700
Message-ID: <d6d7c4410906021622r6fe5b35dn329b3d336385b7bb@mail.gmail.com>
Subject: Re: Subdirectory question revisited
To: core-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016368e1d5db68442046b65d17c

--0016368e1d5db68442046b65d17c
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

There is no technical limit that prevents Hadoop from operating in this
fashion; it's simply the case that the included InputFormat implementations
do not do so. This behavior has been set in this fashion for a long time, so
it's unlikely that it will change soon, as that might break existing
applications.

But you can write your own subclass of TextInputFormat or
SequenceFileInputFormat that overrides the getSplits() method to recursively
descend through directories and search for files.

- Aaron

On Tue, Jun 2, 2009 at 1:22 PM, David Rosenstrauch <darose@darose.net>wrote:

> As per a previous list question (
> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3Ce75c02ef0804011433x144813e6x2450da7883de3aca@mail.gmail.com%3E)
> it looks as though it's not possible for hadoop to traverse input
> directories recursively in order to discover input files.
>
> Just wondering a) if there's any particular reason why this functionality
> doesn't exist, and b) if not, if there's any workaround/hack to make it
> possible.
>
> Like the OP, I was thinking it would be helpful to partition my input data
> by year, month, and day.  I figured his would enable me to run jobs against
> specific date ranges of input data, and thereby speed up the execution of my
> jobs since they wouldn't have to process every single record.
>
> Any way to make this happen?  (Or am I totally going about this the wrong
> way for what I'm trying to achieve?)
>
> TIA,
>
> DR
>

--0016368e1d5db68442046b65d17c--