hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Custom FileOutputFormat / RecordWriter
Date Tue, 26 Jul 2011 08:34:30 GMT
Tom,

What I meant to say was that doing this is well supported with
existing API/libraries itself:

- The class MultipleOutputs supports providing a filename for an
output. See MultipleOutputs.addNamedOutput usage [1].
- The type 'NullWritable' is a special writable that doesn't do
anything. So if its configured into the above filename addition as a
key-type, and you pass NullWritable.get() as the key in every write
operation, you will end up just writing the value part of (key,
value).
- This way you do not have to write a custom OutputFormat for your use-case.

[1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
(Also available for the new API, depending on which
version/distribution of Hadoop you are on)

On Tue, Jul 26, 2011 at 3:36 AM, Tom Melendez <tom@supertom.com> wrote:
> Hi Harsh,
>
> Thanks for the response.  Unfortunately, I'm not following your response.  :-)
>
> Could you elaborate a bit?
>
> Thanks,
>
> Tom
>
> On Mon, Jul 25, 2011 at 2:10 PM, Harsh J <harsh@cloudera.com> wrote:
>> You can use MultipleOutputs (or MultiTextOutputFormat for direct
>> key-file mapping, but I'd still prefer the stable MultipleOutputs).
>> Your sinking Key can be of NullWritable type, and you can keep passing
>> an instance of NullWritable.get() to it in every cycle. This would
>> write just the value, while the filenames are added/sourced from the
>> key inside the mapper code.
>>
>> This, if you are not comfortable writing your own code and maintaining
>> it, I s'pose. Your approach is correct as well, if the question was
>> specifically that.
>>
>> On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez <tom@supertom.com> wrote:
>>> Hi Folks,
>>>
>>> Just doing a sanity check here.
>>>
>>> I have a map-only job, which produces a filename for a key and data as
>>> a value.  I want to write the value (data) into the key (filename) in
>>> the path specified when I run the job.
>>>
>>> The value (data) doesn't need any formatting, I can just write it to
>>> HDFS without modification.
>>>
>>> So, looking at this link (the Output Formats section):
>>>
>>> http://developer.yahoo.com/hadoop/tutorial/module5.html
>>>
>>> Looks like I want to:
>>> - create a new output format
>>> - override write, tell it not to call writekey as I don't want that written
>>> - new getRecordWriter method that use the key as the filename and
>>> calls my outputformat
>>>
>>> Sound reasonable?
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>> --
>>> ===================
>>> Skybox is hiring.
>>> http://www.skyboximaging.com/careers/jobs
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> ===================
> Skybox is hiring.
> http://www.skyboximaging.com/careers/jobs
>



-- 
Harsh J

Mime
View raw message