avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sid Shetye <sid...@outlook.com>
Subject RE: unsigned 32bit (uint) in Avro - C# ?
Date Thu, 13 Feb 2014 00:04:41 GMT
Went through that thread. None are convincing from a design standpoint because:

1.  Avro is used in non-Java environments. The Avro IDL is language agnostic while the code-gen
is language-specific. So the C# code-gen could spit out unsigned. Every language has limitations
but not sure why Java's limitations should drive Avro's designs, despite the heritage. (it's
going to grow into other languages, right?)
2. unsigned 32/64bit values have been extensively used as primitive types for over 3 decades
(i.e. it's held it's ground. Heck, even core Java devs hate that unsigned doesn't exist. eg
http://stackoverflow.com/questions/430346/why-doesnt-java-support-unsigned-ints)
3. All other workarounds simply add more friction to development when in reality, working
with a primitive data type that's been around "forever" should be very transparent and very
fluid.

Stepping off the soapbox, I also have a workaround for future readers. We cast uint<->
int after temporarily disabling arithmetic overflows, and then let Avro handle then as signed
varints (aka zipzag varints). As example code: 

int avroInt32; // this is code-gen'd off the IDL
uint csharpUint32; // this is an app domain var 

// to avro DTO
avroInt32 = unchecked((int) csharpUint32);

// from Avro DTO
csharpUint32 = unchecked((uint)avroInt32 );

Pros:
a) Use the encoding compression inherent in varints (eg: stay under 4 bytes till 134,217,727)
b) Keep the application domain logic as unsigned (as it needs to be)
c) Minimize the glue logic / impedance when converting from app domain => DTO domain

Cons:
1) Specific glue code needed because Avro inherits Java's limitations
2) We're still wasting half of the addressable range since we're skipping every other possible
varint encoding (reserved for -ve numbers) since we only see +ve numbers. Which means instead
of hitting my 5th varint byte after 268,435,455, I now need that 5th byte at half that - 134,217,727.
It's not *too* bad but seems wasteful to always transport a bit that's never used (bit 0,
a zigzag varint's 'sign bit' will always be 0, carrying no informational content). 

Cheers
Sid

> From: harsh@cloudera.com
> Date: Wed, 12 Feb 2014 17:50:02 +0530
> Subject: Re: unsigned 32bit (uint) in Avro - C# ?
> To: user@avro.apache.org
> 
> See also this past thread on the topic perhaps:
> http://mail-archives.apache.org/mod_mbox/avro-user/201212.mbox/%3c50D38260.8060402@methodstudios.com%3e
> 
> On Mon, Feb 10, 2014 at 3:46 PM, Mika Ristimaki
> <mika.ristimaki@gmail.com> wrote:
> > Hi,
> >
> > Java doesn't have unsigned primitives, so most likely Avro doesn't support
> > them directly either.
> >
> > -Mika
> >
> > On Feb 10, 2014, at 3:34 AM, Sid Shetye <sid314@outlook.com> wrote:
> >
> > How do I serialize an unsigned integer (uint or UInt32 in C#) in Avro?
> >
> > It's very bizarre that unsigned aren't discussed at
> > http://avro.apache.org/docs/1.7.6/spec.html#schema_primitive
> >
> >
> >
> >
> 
> 
> 
> -- 
> Harsh J
 		 	   		  
Mime
View raw message