hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Luengo Cabanillas <cabi...@gmail.com>
Subject Re: CHANGING FINAL OUTPUT FILE NAME
Date Fri, 02 Oct 2009 15:27:12 GMT
Hi again! I´ve been trying to apply the first of the solutions Jason
proposed two mails ago, but I have some questions about it:

1.-In Hadoop 0.20.1 "RecordWriter" is an abstract class, so it can´t be
instantiated.
2.- The "configure" and "close" methods of my task you refer are the
"configure" and "close" methods used in the Reducer?

It would help a lot if you´d put some code instructions (only the ones
reffering to this issue) so I can have a better point of view.

Finally, I´d like to show my close method in my reducer class, because I use
a different approach and I can´t see why it fails:

    *@Override
    protected void cleanup(Context cont) throws IOException {

      //write output to a file
      Configuration conf = new Configuration();
      JobContext jCont = new JobContext(conf, null);
      FileSystem fs = FileSystem.get(jCont.getConfiguration());
      Path outDir = new Path("/user/hadoop-user/output", "output");
      Path outFile = new Path(outDir, "reduce-out");
      SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
          outFile, LongWritable.class, LongWritable.class,
          CompressionType.NONE);
      writer.append(new Text(keyword), new IntWritable(fitnessValue));
      writer.close();
    }
*
Thanks a lot in advance!

2009/10/2 Jason Venner <jason.hadoop@gmail.com>

> I see these ways to go here.
>
>   1.  The one I know to work is to create a recordwriter in the configure
>   method of your task, in the per task work/output directory, and then
> rename
>   it to your chosen name in the close. your task calls write on the
>   recordwriter directly instead of output.collect
>   2. Use the multi output format
>   3. in the close method of the task, rename the part-xxx to your name. I
>   am not certain that this is safe in the close method of the task
>   4. define a custom OutputCommitter class which renames the file to your
>   chosen name.
>
>
>
>
> On Thu, Oct 1, 2009 at 1:00 PM, Alberto Luengo Cabanillas <
> cabiwan@gmail.com
> > wrote:
>
> > Hi everyone! I have a newbie question: I´m actually using Hadoop 0.20.1
> and
> > I´d like to know how can I change the name of the resulting file with the
> > one I want (i.e from "part-r-00000" to "myoutput"). I´ve found something
> > related in JIRA (https://issues.apache.org/jira/browse/MAPREDUCE-370)
> but
> > I
> > don´t know for sure i that is my problem too. In this case, do I apply
> the
> > patch over the affected file and I´m ready to go or do I need to do
> > something more later?
> > Thanks a lot!
> >
> > --
> > Alberto
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Alberto

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message