hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Processing multiple files - need to identify in map
Date Wed, 05 Mar 2008 01:54:14 GMT


Use the configure method which is called each time a new file is used in the
map.  Save the file name in a field of the mapper.

The other alternative is to derive a new InputFormat that remembers the
input file name.

On 3/4/08 5:38 PM, "Tarandeep Singh" <tarandeep@gmail.com> wrote:

> Hi,
> I need to identify from which file, a key came from, in the map phase.
> Is it possible ?
> What I have is multiple types of log files in one directory that I
> need to process for my application. Right now, I am relying on the
> structure of the log files (e.g if a line starts with "weblog", the
> line came from Log File A or if the number of tab-separated fields in
> the line is N, then it is Log File B)
> Is there a better way to do this ?
> Is there a way that the Hadoop framework passes me as a key the path
> of the file (right now it is the offset in the file, I guess) ?
> One more related question - can I set 2 directories as input to my map
> reduce program ? This is just to avoid copying files from one log
> directory to another.
> thanks,
> Taran

View raw message