hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Re: how to overwrite output in HDFS?
Date Wed, 04 Apr 2012 05:36:50 GMT
Hi,

Check the links below.

Read from HDFS:
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
Write from HDFS:
https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop

Hope they help!

Thanks & regards
Arko

On Tue, Apr 3, 2012 at 7:40 AM, Christoph Schmitz
<Christoph.Schmitz@1und1.de> wrote:
> Hi Xin,
>
> when you're running your MapReduce job, at some point you'll have to wire it together,
i.e., say what the mapper class is, what the reducer class is, etc. There you can also configure
the job to use your new OutputFormat class. Something like this:
>
> --------------
> Job job = new Job(conf);
> job.setMapperClass(MyMapper.class);
> job.setReducerClass(MyReducer.class);
> job.setOutputFormatClass(OverwritingTextOutputFormat.class);
> ... // more setters
> job.waitForCompletion();
> --------------
>
> Assuming, of course, that your data is text. Your job should then use that OutputFormat
and overwrite the output directory.
>
> If possible, though, I'd agree and go with Bejoy's solution - it is much more straightforward.
>
> Regards,
> Christoph
>
> -----Ursprüngliche Nachricht-----
> Von: Fang Xin [mailto:nusfangxin@gmail.com]
> Gesendet: Dienstag, 3. April 2012 14:31
> An: mapreduce-user@hadoop.apache.org
> Betreff: Re: how to overwrite output in HDFS?
>
> I create such a class in the project, and build an instance of it in
> main, and try to use this method included, but it didnt work.
> Can you explain a little bit more about how to let this function work?
>
> On Tue, Apr 3, 2012 at 6:39 PM, Christoph Schmitz
> <Christoph.Schmitz@1und1.de> wrote:
>> Hi Xin,
>>
>> you can derive your own output format class from one of the Hadoop OutputFormats
and make sure the "checkOutputSpecs" method, which usually does the checking, is empty:
>>
>> -----------
>> public final class OverwritingTextOutputFormat<K, V> extends TextOutputFormat<K,
V> {
>>    @Override
>>    public void checkOutputSpecs(JobContext job) throws IOException {
>>          // Nothing
>>    }
>> }
>> -----------
>>
>> Regards,
>> Christoph
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Fang Xin [mailto:nusfangxin@gmail.com]
>> Gesendet: Dienstag, 3. April 2012 11:35
>> An: mapreduce-user
>> Betreff: how to overwrite output in HDFS?
>>
>> Hi, all
>>
>> I'm writing my own map-reduce code using eclipse with hadoop plug-in.
>> I've specified input and output directories in the project property.
>> (two folders, namely input and output)
>>
>> My problem is that each time when I do some modification and try to
>> run it again, i have to manually delete the previous output in HDFS,
>> otherwise there will be error.
>> Can anyone kindly suggest how to just simply overwrite the result?
>>
>> Best regards,
>> Xin

Mime
View raw message