hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: Hadoop job using multiple input files
Date Fri, 06 Feb 2009 09:55:40 GMT
Hey Amandeep,

You can get the file name for a task via the "map.input.file" property. For
the join you're doing, you could inspect this property and ouput (number,
name) and (number, address) as your (key, value) pairs, depending on the
file you're working with. Then you can do the combination in your reducer.

You could also check out the join package in contrib/utils (
but I'd say your job is simple enough that you'll get it done faster with
the above method.

This task would be a simple join in Hive, so you could consider using Hive
to manage the data and perform the join.


On Fri, Feb 6, 2009 at 1:34 AM, Amandeep Khurana <amansk@gmail.com> wrote:

> Is it possible to write a map reduce job using multiple input files?
> For example:
> File 1 has data like - Name, Number
> File 2 has data like - Number, Address
> Using these, I want to create a third file which has something like - Name,
> Address
> How can a map reduce job be written to do this?
> Amandeep
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message