hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Spiro (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-6494) MapFile.Reader does not seek to first entry for multi-valued key
Date Fri, 15 Jan 2010 01:02:54 GMT
MapFile.Reader does not seek to first entry for multi-valued key
----------------------------------------------------------------

                 Key: HADOOP-6494
                 URL: https://issues.apache.org/jira/browse/HADOOP-6494
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
            Reporter: Peter Spiro
            Priority: Minor


When a MapFile contains a key with multiple entries and one of these entries other than the
first happens to be stored in the index, then the Reader's seek() and get*() methods will
generally not return the first entry, making it impossible to retrieve all of the key's entries
using next().

One easy solution would be to modify the Writer's append() method to only index an entry if
it's the first entry belonging to its key, e.g.:


    public synchronized void append(WritableComparable key, Writable val)
      throws IOException {

      boolean equalsLastKey = (size != 0 && comparator.compare(lastKey, key) == 0);
      checkKey(key);

      boolean largeEnoughInterval = size % indexInterval == 0;
      if (largeEnoughInterval && !equalsLastKey) {            // add an index entry
        position.set(data.getLength());           // point to current eof
        index.append(key, position);
      }

      data.append(key, val);                      // append key/value to data
      if (!largeEnoughInterval || !equalsLastKey)
          size++;
    }


(The size variable should then be renamed to something more accurate.)




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message