hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teng, James" <xt...@ebay.com>
Subject Multiple Output Format -Unrecognizable Characters in Output File
Date Mon, 18 Jul 2011 06:00:42 GMT

Hi,
I encounter a problem why try to define my own MultipleOutputFormat class, here is the codes
bellow.
public class MultipleOutputFormat extends FileOutputFormat<LongWritable,Text>{
      public class LineWriter extends RecordWriter<LongWritable,Text>{
            private DataOutputStream output;
            private byte separatorBytes[];
            public LineWriter(DataOutputStream output, String separator) throws UnsupportedEncodingException
            {
                  this.output=output;
                  this.separatorBytes=separator.getBytes("UTF-8");
            }
            @Override
            public synchronized void close(TaskAttemptContext context) throws IOException,
                        InterruptedException {
                  // TODO Auto-generated method stub
                  output.close();
            }

            @Override
            public void write(LongWritable key, Text value) throws IOException,
                        InterruptedException {
                  System.out.println("key:"+key.get());
                  System.out.println("value:"+value.toString());
                  // TODO Auto-generated method stub
                  //output.writeLong(key.)
                  //output.write(separatorBytes);
                  //output.write(value.toString().getBytes("UTF-8"));
                  //output.write("\n".getBytes("UTF-8"));
                  //key.write(output);
                  key.write(output);
value.write(output);

                  output.write("\n".getBytes("UTF-8"));
            }
      }
      private Path path;
      protected String generateFileNameForKeyValue(LongWritable key,Text value,String name)
      {
            return "key"+Math.random();
      }

      @Override
      public RecordWriter<LongWritable, Text> getRecordWriter(
                  TaskAttemptContext context) throws IOException, InterruptedException {
            path=getOutputPath(context);
            System.out.println("ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd");
            // TODO Auto-generated method stub
            Path file = getDefaultWorkFile(context, "");
            FileSystem fs = file.getFileSystem(context.getConfiguration());

            FSDataOutputStream fileOut = fs.create(file, false);

            return new LineWriter(fileOut, "\t");

      }

however, there is a problem of unrecognizable characters occurrences in the output file,
is there any one encounter the problem before, any comment is greatly appreciated, thanks
in advance.


James, Teng (Teng Linxiao)
eRL,   CDC,    eBay,    Shanghai
Extension:        86-21-28913530
MSN:     tenglinxiao@hotmail.com<mailto:tenglinxiao@hotmail.com>
Skype:                James,Teng
Email:            xteng@ebay.com<mailto:xteng@ebay.com>
[cid:image002.gif@01CC4553.143F5A00]

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message