hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamal sasha <jamalsha...@gmail.com>
Subject Inputformat
Date Fri, 21 Jun 2013 21:38:02 GMT
Hi,

  I am using one of the libraries which rely on InputFormat.
Right now, it is reading xml files spanning across mutiple lines.
So currently the input format is like:

public class XMLInputReader extends FileInputFormat<LongWritable, Text> {

  public static final String START_TAG = "<page>";
  public static final String END_TAG = "</page>";

  @Override
  public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
      JobConf conf, Reporter reporter) throws IOException {
    conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
    conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
    return new XMLRecordReader((FileSplit) split, conf);
  }
}
So, in above if the data is like:

<page>

 soemthing \n
somthing \n

</page>

It process this sort of data..


Now, i want to use the same framework but for json files but lasting just
single line..

So I guess my
my START_TAG can be "{"

Will my END_TAG be "}\n"

it can't be "}" as there can be nested json in this data?

Any clues
Thanks

Mime
View raw message