avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J Chouraria (JIRA)" <j...@apache.org>
Subject [jira] Created: (AVRO-764) Possible issue with BinaryData.compare(...) used in Map/Reduce
Date Sun, 13 Feb 2011 15:37:57 GMT
Possible issue with BinaryData.compare(...) used in Map/Reduce

                 Key: AVRO-764
                 URL: https://issues.apache.org/jira/browse/AVRO-764
             Project: Avro
          Issue Type: Bug
          Components: doc
    Affects Versions: 1.4.1
         Environment: Linux, CDH3
            Reporter: Harsh J Chouraria
            Assignee: Harsh J Chouraria
            Priority: Critical
             Fix For: 1.5.0

MapReduce's RawComparator feature has a call {{compare(b1, start1, length1, b2, start2, length2)}}
which is handled for Avro using {{BinaryData.compare(b1, start1, b2, start2, schema)}}

BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema) utility, has
a sub-clause that handles byte array inputs of less than 16 bytes in length.

This is the exact code which am talking about:
{code:title=BinaryDecoder.java Lines 879-893|borderStyle=solid}
    private ByteArrayByteSource(byte[] data, int start, int len) {
      // make sure data is not too small, otherwise getLong may try and
      // read 10 bytes and get index out of bounds.
      if (data.length < 16 || len < 16) {
        this.data = new byte[16];
        System.arraycopy(data, start, this.data, 0, len);
        this.position = 0;
        this.max = len;
      } else {
        // use the array passed in
        this.data = data;
        this.position = start;
        this.max = start + len;

This clause would fail during {{arrayCopy}} since the target {{len}} bytes is sent into the
constructor as the {{length}} of the byte array input itself, and not the length as given
by the framework which should be the right one [Since the BD.compare() call does not take
it as a parameter]. Thus, arrayCopy would be trying to copy from {{start}} byte index to {{length}}
bytes after it, where {{length}} is the byte array's {{.length}} itself -- which leads to
an index bounds exception.

Here is a slightly low level test case that should fail with an {{ArrayIndexOutOfBoundsException}}
due to a bad {{System.arrayCopy}} call:
package org.apache.avro.io;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.avro.Schema;
import org.apache.avro.Schema.Field;
import org.apache.avro.Schema.Type;
import org.apache.avro.generic.GenericData.Record;
import org.apache.avro.generic.GenericDatumWriter;
import org.junit.Assert;
import org.junit.Test;

public class TestBinaryData {

  public void testCompare() {
    // Prepare a schema for testing.
    Field integerField = new Field("test", Schema.create(Type.INT), null, null);
    List<Field> fields = new ArrayList<Field>();
    Schema record = Schema.createRecord("test", null, null, false);
    ByteArrayOutputStream b1 = new ByteArrayOutputStream(5);
    ByteArrayOutputStream b2 = new ByteArrayOutputStream(5);
    BinaryEncoder b1Enc = new BinaryEncoder(b1);
    BinaryEncoder b2Enc = new BinaryEncoder(b2);
    Record testDatum1 = new Record(record);
    testDatum1.put(0, 1);
    Record testDatum2 = new Record(record);
    testDatum2.put(0, 2);
    GenericDatumWriter<Record> gWriter = new GenericDatumWriter<Record>(record);
    Integer start1 = 0, start2 = 0;
    try {
      gWriter.write(testDatum1, b1Enc);
      start1 = b1.size();
      gWriter.write(testDatum1, b1Enc);
      gWriter.write(testDatum2, b2Enc);
      start2 = b2.size();
      gWriter.write(testDatum2, b2Enc);
      BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2, record);
    } catch (IOException e) {
      Assert.fail("IOException while writing records to output stream.");

A solution would be to let the length, as given by the MapReduce framework itself, be used
instead of using {{bytearray.length}} in its place. I'll attach a patch for this soon.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message