hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <Guy.Doulb...@conduit.com>
Subject RE: DataCreator
Date Wed, 16 Feb 2011 15:37:27 GMT

How can I use Hive for doing that?

I was thinking of using cascading, but cascading, requires me for each change in the data
flow, to recompile and deploy. Maybe cascading can be part of the implementation but not the

As for Pig I would need to look how I can use it to achieve the porpuse,

I my vision, a non skilled person would have a Ui, in which he could assign for each source,
transformations and partitions.
What I am looking for is very similar to Flume, beside the fact that flume is for event streaming,
and what I am looking for, is for chunks of data.

From: Ted Dunning [mailto:tdunning@maprtech.com]
Sent: Wednesday, February 16, 2011 5:19 PM
To: common-user@hadoop.apache.org
Cc: Guy Doulberg
Subject: Re: DataCreator

Sounds like Pig.  Or Cascading.  Or Hive.

Seriously, isn't this already available?
On Wed, Feb 16, 2011 at 7:06 AM, Guy Doulberg <Guy.Doulberg@conduit.com<mailto:Guy.Doulberg@conduit.com>>

Hey all,
I want to consult with you hadoppers about a Map/Reduce application I want to build.

I want to build a map/reduce job, that read files from HDFS, perform some sort of transformation
on the file lines, and store them to several partition depending on the source of the file
or its data.

I want this application to be as configurable as possible, so I designed interfaces to Parse,
Decorate and Partition(On HDFS) the Data.

I want to be able to configure different data flows, with different parsers, decorators and
partitioners, using a config file.

Do you think, you would use such an application? Does it fit an open-source project?

Now, I have some technical questions:
I was thinking of using reflection, to load all the classes I would need according to the
configuration during the setup process of the Mapper.
Do you think it is a good idea?

Is there a way to send the Mapper objects or interfaces from the Job declaration?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message