hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()
Date Tue, 22 Sep 2009 08:32:16 GMT
HBASE-1765 broke MapReduce when using Result.list()
---------------------------------------------------

                 Key: HBASE-1856
                 URL: https://issues.apache.org/jira/browse/HBASE-1856
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Lars George
            Priority: Critical
             Fix For: 0.20.1


Not sure if it is just me, but using MR over HBase employing a TableReducer is not working.
After the first row is read all subsequent rows get the same Result's of that very first row.
After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed
field parsing change.

This is the code I use in the reduce():

{code}
   @Override
    protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
        Context context) throws IOException, InterruptedException {
      String skey = Bytes.toString(key.get());
      context.getCounter(CountersTotals.ROWS).increment(1);
      for (Result result : values) {
        for (KeyValue kv: result.list()) {
          try {
            if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv ->
" + kv);
            ...
{code}

Here is the current list() implementation:

{code}
  public List<KeyValue> list() {
    if(this.kvs == null) {
      readFields();
    }
    return isEmpty()? null: Arrays.asList(sorted());
  }
{code}

The problem is that readFields(DataInput) does not clear kvs!

{code}
  public void readFields(final DataInput in)
  throws IOException {
    familyMap = null;
    row = null;
    int totalBuffer = in.readInt();
    if(totalBuffer == 0) {
      bytes = null;
      return;
    }
    byte [] raw = new byte[totalBuffer];
    in.readFully(raw, 0, totalBuffer);
    bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
  }
{code}

The above is called by the MR framework's WritableSerialization for each map output. But since
"kvs" is already set "list()" returns the old data!

I assume the only change needed is clearing kvs as well:

{code}
  public void readFields(final DataInput in)
  throws IOException {
    familyMap = null;
    row = null;
    kvs = null;
    ....
{code}

I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message