hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?
Date Tue, 10 Aug 2010 01:14:52 GMT
Another solution would be to create a custom named output using
mapred.lib.MultipleOutputs and collecting to that instead of the
job-set output format (which one can set to NullOutputFormat so it
doesn't complain about existing paths, etc.).

So if you'd want 'foo' prefix to your 00000-NNNNN numbered output
files (instead of default 'part'), you'd create it with
MultipleOutputs.addNamedOutput(Conf, "foo", YourOutFormat.class,
Key.class, Value.class);

The extension, I believe, can be changed too, while 'getting' the path
from the FileOutputFormat while building your RecordWriter. Something
Path outPath = FileOutputFormat.getTaskOutputPath(job, name+YOUR_EXTENSION);
// Now create the 'writer' on this path.

On Tue, Aug 10, 2010 at 3:30 AM, David Rosenstrauch <darose@darose.net> wrote:
> On 08/09/2010 05:45 PM, David Rosenstrauch wrote:
>> On 08/09/2010 04:01 PM, David Rosenstrauch wrote:
>>> On a similar note, it looks like if I want to customize the name/path of
>>> the generated SequenceFile my only option currently is to override
>>> FileOutputFormat.getDefaultWorkFile().
>>> a) Again, have I got this correct, or am I overlooking something?
>>> b) Would anyone else agree that this is something that can/should be
>>> made easier? (And thus worthy of a bug report?)
>>> Thanks,
>>> DR
>> Ugh. Actually, this looks even worse than I thought.
>> It looks like there's a bunch of static helper methods in
>> FileOutputFormat which use methods other than getDefaultWorkFile() to
>> determine the file name.
>> It looks like most of them use the method getUniqueFile(). Problem is
>> that getUniqueFile is a *static* method, so I can't override it with an
>> alternate implementation.
>> Anyone know any short way out of this conundrum without my having to
>> completely reimplement chunks of
>> FileOutputFormat/SequenceFileOutputFormat?
>> Thanks,
>> DR
> Hmmm ... on second look, overriding getDefaultWorkFile() should work. That's
> the method called by SequenceFileOutputFormat.getRecordWriter. So sorry for
> the noise.
> Still, would be helpful if there were a less kludgey way to handle this, I'd
> think.
> Thanks,
> DR

Harsh J

View raw message