avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Oshinsky <doshin...@commvault.com>
Subject Re: bytes array for decimal logical type sometimes corrupted
Date Sat, 05 Dec 2015 04:48:42 GMT
No, the byte array (directly from byte buffer) is directly converted into BigInteger.  Logic
for reading from avro or parquet (the latter using avro parquet reader) is like this:


            ByteBuffer bb = (ByteBuffer) obj;   // obtained from GenericRecord
            byte[] b = bb.array();
            BigInteger bi = new BigInteger(b);
            BigDecimal bd = new BigDecimal(bi, scale);

Logic for writing to avro or parquet (the latter using avro parquet writer) is like this:

           BigDecimal bd = (BigDecimal) obj;
            ...
           BigInteger bi = bd.unscaledValue();
           byte[] barray = bi.toByteArray();
           putBinary(rec, name, barray);
          ...
 public static void putBinary(GenericRecord rec, String name, byte[] b) {
        ByteBuffer byteBuffer = ByteBuffer.allocate(b.length);
        byteBuffer.put(b);
        byteBuffer.rewind();   // if not done, writes nothing to avro or parquet
        rec.put(name, byteBuffer);
    }

The byte array that was written into Avro was 3 bytes in length, as explained earlier.  The
byte array that was read from Avro was 4 bytes in length, with the decimal 32 (space character)
padding the end.

Either I am using Avro incorrectly, or the bytes that are read back are not the same as the
bytes originally written.  The exact same code works properly with Parquet (accessed using
Avro Parquet reader and writer).  I followed the format for the bytes (representing the decimal
number) as specified here:
https://avro.apache.org/docs/1.7.7/spec.html#Decimal​











________________________________
From: seemanto.barua@nomura.com <seemanto.barua@nomura.com>
Sent: Friday, December 4, 2015 6:34 PM
To: user@avro.apache.org
Subject: Re: bytes array for decimal logical type sometimes corrupted

When reading the byte[] from avro do you copy the bytebuffer to your own array?

From: Dave Oshinsky [mailto:doshinsky@commvault.com]
Sent: Friday, December 04, 2015 05:47 PM
To: user@avro.apache.org <user@avro.apache.org>
Subject: bytes array for decimal logical type sometimes corrupted

I am a new user of Avro 1.7.7.  My (Java) application is reading rows from an Oracle DB, and
archiving them to Avro (and Parquet).  For NUMBER Oracle data, my code converts the unscaled
BigInteger (from BigDecimal) number into a bytes array, and archives that to Avro using a
ByteBuffer in the GenericRecord.  In one case, the NUMBER value from Oracle is 14099 (precision
8, scale 2, for a column named “CURBAL”), which is archived to Avro based on the unscaled
value of 1409900.  This corresponds to a bytes array of length 3, consisting of these 3 bytes
(decimal values):  21, -125, and 108.  When my code reads this CURBAL value back from Avro,
it is corrupted (padded?), with a fourth byte added that happens to be decimal 32 (an ASCII
space), i.e., the 4 byte decimal values seen upon reading back from Avro are 21, -125, 108,
and 32.  Has anyone seen a similar issue?  I am archiving the same data to Parquet, and reading
it back without any corruption.  I am wondering whether I am using Avro improperly here.

The schema that I’m using is shown below my signature, with various additions (look for
“cv_” prepended in the key) for JDBC ResultSetMetaData info that my code is preserving
for later on.  I have attached sample Avro and Parquet files to this email, with corruption
in CURBAL of the tenth record of the Avro.  (I realize that the attachments may not get forwarded
– let me know if I should send them to you individually.)  Interestingly, if I write this
tenth record to another Avro file as the first record, it does not get corrupted (an alignment/padding
issue?).

Thanks in advance,
Dave Oshinsky
Commvault Systems
doshinsky@commvault.com

JSON schema:

{
  "type" : "record",
  "name" : "my_table",
  "namespace" : "com.commvault",
  "fields" : [ {
    "name" : "ACCT_NO",
    "type" : {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 0,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_connection" : "oracle.jdbc.driver.T4CConnection",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 0,
      "cv_precision" : 20,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 1,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
    }
  }, {
    "name" : "SF_NO",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 10,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 2,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "LF_NO",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
     "cv_precision" : 10,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 3,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "BRANCH_NO",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 0,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
     "cv_precision" : 20,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 4,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
    } ]
  }, {
   "name" : "INTRO_CUST_NO",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 0,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 20,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 5,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
   } ]
  }, {
    "name" : "INTRO_ACCT_NO",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 0,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 20,
      "cv_read_only" : false,
     "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 6,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
    } ]
  }, {
    "name" : "INTRO_SIGN",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
     "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 1,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 7,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "TYPE",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 2,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 8,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "OPR_MODE",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 2,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 9,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "CUR_ACCT_TYPE",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 4,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 10,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "TITLE",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 30,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 11,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "CORP_CUST_NO",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 0,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 20,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 12,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
    } ]
  }, {
    "name" : "APLNDT",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.sql.Timestamp",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 0,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 13,
      "cv_type" : 93,
      "cv_typename" : "DATE",
      "cv_writable" : true
    } ]
  }, {
    "name" : "OPNDT",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.sql.Timestamp",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 0,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 14,
      "cv_type" : 93,
      "cv_typename" : "DATE",
      "cv_writable" : true
    } ]
  }, {
    "name" : "VERI_EMP_NO",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 0,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 20,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 15,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
    } ]
  }, {
    "name" : "VERI_SIGN",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 1,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
     "cv_signed" : true,
      "cv_subscript" : 16,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
     "cv_writable" : true
    } ]
  }, {
    "name" : "MANAGER_SIGN",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 1,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 17,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  }, {
    "name" : "CURBAL",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 8,
      "scale" : 2,
      "cv_auto_incr" : false,
      "cv_case_sensitive" : false,
      "cv_column_class" : "java.math.BigDecimal",
      "cv_currency" : true,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 8,
      "cv_read_only" : false,
      "cv_scale" : 2,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 18,
      "cv_type" : 2,
      "cv_typename" : "NUMBER",
      "cv_writable" : true
    } ]
  }, {
    "name" : "STATUS",
    "type" : [ "null", {
      "type" : "string",
      "cv_auto_incr" : false,
      "cv_case_sensitive" : true,
      "cv_column_class" : "java.lang.String",
      "cv_currency" : false,
      "cv_def_writable" : false,
      "cv_nullable" : 1,
      "cv_precision" : 1,
      "cv_read_only" : false,
      "cv_scale" : 0,
      "cv_searchable" : true,
      "cv_signed" : true,
      "cv_subscript" : 19,
      "cv_type" : 12,
      "cv_typename" : "VARCHAR2",
      "cv_writable" : true
    } ]
  } ]
}


***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

PLEASE READ: This message is for the named person's use only. It may contain confidential,
proprietary or legally privileged information. No confidentiality or privilege is waived or
lost by any mistransmission. If you receive this message in error, please delete it and all
copies from your system, destroy any hard copies and notify the sender. You must not, directly
or indirectly, use, disclose, distribute, print, or copy any part of this message if you are
not the intended recipient. Nomura Holding America Inc., Nomura Securities International,
Inc, and their respective subsidiaries each reserve the right to monitor all e-mail communications
through its networks. Any views expressed in this message are those of the individual sender,
except where the message states otherwise and the sender is authorized to state the views
of such entity. Unless otherwise stated, any pricing information in this message is indicative
only, is subject to change and does not constitute an offer to deal at any price quoted. Any
reference to the terms of executed transactions should be treated as preliminary only and
subject to our formal written confirmation.



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
Mime
View raw message