hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dino Kečo <dino.k...@gmail.com>
Subject Re: Multiple input formats and multiple output formats in Hadoop 0.20.2
Date Wed, 10 Aug 2011 16:20:26 GMT
Hi John,

I think this is what are you looking for:



Examples of usages are part of API doc.

Dino Kečo

On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang <jian.fang.subscribe@gmail.com>wrote:

> Hi,
> I am working on a project, which requires multiple input formats and
> multiple output formats. Basically, I store some sales rank data to a
> Cassandra cluster and I get a sales rank update file each day to update the
> ranks in the Cassandra. In the meanwhile, I need to find all the products
> whose rank change exceeds a threshold and output them to a file. That is to
> say, I need two input formats, one from the file system (sales rank update
> file) and one from the Cassandra (current sales rank), and two output
> formats, one to the file system (products whose rank change exceeds a
> threshold) and one to Cassandra (write the new rank to Cassandra).
> Right now, I used multiple cascading jobs to do the work and use HDFS to
> share data among jobs. But this is not very efficient since some
> intermediate files need to be read multiple times in different jobs. I
> wonder if there is a more elegant way to solve this problem. Seems Hadoop
> 0.19 supports multiple input/output formats. It would be great if I could
> merge the multiple jobs to one with multiple input formats and multiple
> output formats. Is this doable in Hadoop 0.20.2?  Are there any examples of
> multiple input formats and multiple output formats for Hadoop 0.20.2?
> Thanks in advance,
> John

View raw message