avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J Chouraria (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-764) Possible issue with BinaryData.compare(...) used in Map/Reduce
Date Sun, 13 Feb 2011 15:39:57 GMT

     [ https://issues.apache.org/jira/browse/AVRO-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J Chouraria updated AVRO-764:
-----------------------------------

    Component/s:     (was: doc)
                 java

> Possible issue with BinaryData.compare(...) used in Map/Reduce
> --------------------------------------------------------------
>
>                 Key: AVRO-764
>                 URL: https://issues.apache.org/jira/browse/AVRO-764
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.1
>         Environment: Linux, CDH3
>            Reporter: Harsh J Chouraria
>            Assignee: Harsh J Chouraria
>            Priority: Critical
>              Labels: hadoop
>             Fix For: 1.5.0
>
>
> MapReduce's RawComparator feature has a call {{compare(b1, start1, length1, b2, start2,
length2)}} which is handled for Avro using {{BinaryData.compare(b1, start1, b2, start2, schema)}}
> BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema) utility,
has a sub-clause that handles byte array inputs of less than 16 bytes in length.
> This is the exact code which am talking about:
> {code:title=BinaryDecoder.java Lines 879-893|borderStyle=solid}
>     private ByteArrayByteSource(byte[] data, int start, int len) {
>       super();
>       // make sure data is not too small, otherwise getLong may try and
>       // read 10 bytes and get index out of bounds.
>       if (data.length < 16 || len < 16) {
>         this.data = new byte[16];
>         System.arraycopy(data, start, this.data, 0, len);
>         this.position = 0;
>         this.max = len;
>       } else {
>         // use the array passed in
>         this.data = data;
>         this.position = start;
>         this.max = start + len;
>       }
>     }
> {code}
> This clause would fail during {{arrayCopy}} since the target {{len}} bytes is sent into
the constructor as the {{length}} of the byte array input itself, and not the length as given
by the framework which should be the right one [Since the BD.compare() call does not take
it as a parameter]. Thus, arrayCopy would be trying to copy from {{start}} byte index to {{length}}
bytes after it, where {{length}} is the byte array's {{.length}} itself -- which leads to
an index bounds exception.
> Here is a slightly low level test case that should fail with an {{ArrayIndexOutOfBoundsException}}
due to a bad {{System.arrayCopy}} call:
> {code:title=TestBinaryData.java|borderStyle=solid}
> package org.apache.avro.io;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.avro.Schema;
> import org.apache.avro.Schema.Field;
> import org.apache.avro.Schema.Type;
> import org.apache.avro.generic.GenericData.Record;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestBinaryData {
>   @Test
>   public void testCompare() {
>     // Prepare a schema for testing.
>     Field integerField = new Field("test", Schema.create(Type.INT), null, null);
>     List<Field> fields = new ArrayList<Field>();
>     fields.add(integerField);
>     Schema record = Schema.createRecord("test", null, null, false);
>     record.setFields(fields);
>     
>     ByteArrayOutputStream b1 = new ByteArrayOutputStream(5);
>     ByteArrayOutputStream b2 = new ByteArrayOutputStream(5);
>     BinaryEncoder b1Enc = new BinaryEncoder(b1);
>     BinaryEncoder b2Enc = new BinaryEncoder(b2);
>     Record testDatum1 = new Record(record);
>     testDatum1.put(0, 1);
>     Record testDatum2 = new Record(record);
>     testDatum2.put(0, 2);
>     GenericDatumWriter<Record> gWriter = new GenericDatumWriter<Record>(record);
>     Integer start1 = 0, start2 = 0;
>     try {
>       gWriter.write(testDatum1, b1Enc);
>       b1Enc.flush();
>       start1 = b1.size();
>       gWriter.write(testDatum1, b1Enc);
>       b1Enc.flush();
>       b1.close();
>       gWriter.write(testDatum2, b2Enc);
>       b2Enc.flush();
>       start2 = b2.size();
>       gWriter.write(testDatum2, b2Enc);
>       b2Enc.flush();
>       b2.close();
>       BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2, record);
>     } catch (IOException e) {
>       Assert.fail("IOException while writing records to output stream.");
>     }
>   }
> }
> {code}
> A solution would be to let the length, as given by the MapReduce framework itself, be
used instead of using {{bytearray.length}} in its place. I'll attach a patch for this soon.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message