hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bae, Jae Hyeon" <metac...@gmail.com>
Subject MapFile.fix with block compressed SequenceFile
Date Mon, 22 Mar 2010 07:24:23 GMT
Hi everyone.

I tried to MapFile.fix with block compressed SequenceFile, but I found
that fixed MapFile could not find several keys.

I investigated the cause, it was on SequenceFile.Reader.readBlock.

MapFile.fix function source code is the following:

      long pos = 0L;
      LongWritable position = new LongWritable();
      while(dataReader.next(key, value)) {
        if (cnt % indexInterval == 0) {
          if (!dryrun) indexWriter.append(key, position);
        pos = dataReader.getPosition();

At initial status, dataReader.getPosition() returns 121. When
dataReader.next() is called once, if SequenceFile is block-compressed,
dataReader.getPosition() returns 2617389. This number means the file
position of input stream embedded in dataReader is on the second block
of SequenceFile. But several keys are on the first block.

In this case, how can I fix MapFile's index data?

Do I have to give up block compression when it must happen that I
should recover MapFile's index data?

Thank you in advance.

Regards Jae

View raw message