hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-9667) SequenceFile: Reset keys and values when syncing to a place before the header
Date Mon, 24 Jun 2013 16:38:21 GMT
Colin Patrick McCabe created HADOOP-9667:
--------------------------------------------

             Summary: SequenceFile: Reset keys and values when syncing to a place before the
header
                 Key: HADOOP-9667
                 URL: https://issues.apache.org/jira/browse/HADOOP-9667
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Colin Patrick McCabe
            Priority: Minor


There seems to be a bug in the {{SequenceFile#sync}} function.  Thanks to Christopher Ng for
this report:

{code}
    /** Seek to the next sync mark past a given position.*/
    public synchronized void sync(long position) throws IOException {
      if (position+SYNC_SIZE >= end) {
        seek(end);
        return;
      }

      if (position < headerEnd) {
        // seek directly to first record
        in.seek(headerEnd);                                         <====
should this not call seek (ie this.seek) instead?
        // note the sync marker "seen" in the header
        syncSeen = true;
        return;
      }
{code}

the problem is that when you sync to the start of a compressed file, the
noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
triggered.  When you subsequently call next() you're potentially getting
keys from the buffer which still contains keys from the previous position
of the file.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message