avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvalluvan M. G. (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-435) Support Set containers
Date Tue, 02 Mar 2010 02:48:05 GMT

    [ https://issues.apache.org/jira/browse/AVRO-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839982#action_12839982

Thiruvalluvan M. G. commented on AVRO-435:

+1 for "unique":"true".

Array differs from set in two ways:
   - Arrays guarantee the order of elements and sets don't. Using arrays to implement sets
shouldn't be a problem.
   - Arrays do not guarantee uniqueness. Using arrays for sets would work if we can define
_equality_ in the spec. Unfortunately we cannot say "two entities are equal if their schemas
are equal and their bit representations in Avro are equal". The trouble comes from maps (and
sets when we have them) because Avro does not enforce order in them. But we can define _equality_
something like this:
      - Two primitive entities are equal if and only if their schemas and Avro binary representations
are equal.
      - Two complex entities (other than maps and sets) are equal if an only if their schemas
are equal and their contents are equal. For example, two unions are equal if and only if their
union indexes are equal and the union members are equal.
      - Two Maps(sets) are equal if and only if their schemas are equal and their elements
are equal except for their order.

Another question to answer is: Is it the responsibility of the Avro library to ensure uniqueness?
My answer is no. Since, in Avro we interpret the contents assuming that the the schema we
have is indeed the schema that the writer used, we can trust "unique":"true" as well.

That leads us to the next question: What are the resolution rules between arrays and sets?
My answer to this is: sets written can always be read as arrays. Arrays written can be read
as sets as long as the uniqueness constraint is not violated. It extends our schema resolution
philosophy - we try to be as lenient as possible unless we encounter data violations.

> Support Set containers
> ----------------------
>                 Key: AVRO-435
>                 URL: https://issues.apache.org/jira/browse/AVRO-435
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Jonathan Ellis
>            Priority: Minor
> Cassandra uses Set as a return type for some methods.  It would be nice to not have to
use a List as a workaround.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message