hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter W." <pe...@marketingbrokers.com>
Subject Re: Sort inputs, outputs
Date Sun, 01 Jul 2007 23:32:38 GMT
Hi,

You could also do this the old fashioned way:

       try
          {
          JobConf jc=new JobConf(sample.class);
	 ...
          jc.setInputPath(new Path(IN_DIR));
          jc.setOutputPath(new Path(OUT_DIR));
          JobClient.runJob(jc);
	 }
       catch(Exception e){System.out.println(e);}

       String[] d=new File(OUT_DIR).list();
       Arrays.sort(d);

       for(int dint=0;dint<d.length;dint++)
          {
	 if(!d[dint].startsWith("."))	// no dot files
	    {
	    String PART_FILE=OUT_DIR+"/"+d[dint];	// part path

	    // create file object from part path string,
	    // do file merge,append, cleanup

	    }
	 }

Later,

Peter W.


On Jul 1, 2007, at 10:45 AM, Devaraj Das wrote:

>> 3. Are the outputs of the test programs typically part-00000,  
>> part-00001,
> ...part-XXXXX?
>> Is there any suggested method for merging them?
>
> Yes. You could run another mapreduce job with exactly one reduce to  
> merge
> them.

Mime
View raw message