hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur" <tuc...@gmail.com>
Subject Re: Outputting to different paths from the same input file
Date Mon, 14 Jul 2008 10:39:25 GMT
You can use MultipleOutputFormat or MultipleOutputs (it has been
committed to SVN a few days ago) for this.

Then you can use a filter on your input dir for the next jobs so only
files matching a given name/pattern are used.


On Fri, Jul 11, 2008 at 8:54 PM, Jason Venner <jason@attributor.com> wrote:
> We open side effect files in our map and reduce jobs to 'tee' off additional
> data streams.
> We open them in the /configure/ method and close them in the /close/ method
> The /configure/ method provides access to the /JobConf.
> /We create our files relative to value of conf.get("mapred.output.dir"), in
> the map/reduce object instances.
> The files end up in the conf.getOutputPath() directory, and we move them out
> based on knowing the shape of the file names, after the job finishes.
> Then after the job is finished move all of the files to another location
> using a file name based filter to select the files to move (from the job
> schnitzi wrote:
>> Okay, I've found some similar discussions in the archive, but I'm still
>> not
>> clear on this.  I'm new to Hadoop, so 'scuse my ignorance...
>> I'm writing a Hadoop tool to read in an event log, and I want to produce
>> two
>> separate outputs as a result -- one for statistics, and one for budgeting.
>> Because the event log I'm reading in can be massive, I would like to only
>> process it once.  But the outputs will each be read by further M/R
>> processes, and will be significantly different from each other.
>> I've looked at MultipleOutputFormat, but it seems to just want to
>> partition
>> data that looks basically the same into this file or that.
>> What's the proper way to do this?  Ideally, whatever solution I implement
>> should be atomic, in that if any one of the writes fails, neither output
>> will be produced.
>> AdTHANKSvance,
>> Mark
> --
> Jason Venner
> Attributor - Program the Web <http://www.attributor.com/>
> Attributor is hiring Hadoop Wranglers and coding wizards, contact if
> interested

View raw message