avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-435) Support Set containers
Date Tue, 02 Mar 2010 04:03:27 GMT

    [ https://issues.apache.org/jira/browse/AVRO-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840001#action_12840001
] 

Scott Carey commented on AVRO-435:
----------------------------------

An ordered set is easily compared to an array, perhaps they should be ordered.

If things are unordered, comparison gets more complicated.  For example, the simple way to
compare would be to build both sets up in memory -- this would not work for large sets or
lists.
I propose that a set is ordered by default.  A client can do this by either sorting when writing,
or with a data structure like a linked set.  A client can choose to disregard order and accept
that set equivalence is invalid if they wish.

We could consider an "ordered": true property as well
"unique": false, "ordered": true
Would then be the implicit default.  An array or set that is not ordered is always unequal
to another array or set.  A reader with an 'ordered' schema reading an 'unordered' serialization
would be a tough spot.  That might not be a supported promotion.

{quote}
That leads us to the next question: What are the resolution rules between arrays and sets?
My answer to this is: sets written can always be read as arrays. Arrays written can be read
as sets as long as the uniqueness constraint is not violated.{quote}

One can always read an array as a set, even with duplicates.  The duplicates get eliminated
in the process of creating the set.  Interestingly, one can go either direction, but not back
and forth.

I think in the short run, Doug's version is appropriate.  The above would be valuable but
also take a while to sort out the details for what works best across languages and is a spec
change, rather than a Java API extension.  Besides, it should be possible to specify what
object to use as an array container regardless.

Taken in combination with AVRO-436, ordered maps, these two things are potentially significant
changes for something that clients can emulate on their own.  Ordered maps can be emulated
by an array of {key, value} tuples.
The simplest option other than Doug's Java Reflect API version is to require that Avro sets
are ordered for the purposes of equivalence and comparisson, and if a client wants to compare
two objects for equality or sort order they must guarantee order on writing (this restriction
already happens when serializing sets as arrays).  
"unique": (true|false) is then just a reserved keyword hint for languages to construct Set
- like APIs for data access.
We can consider adding the more difficult support for unordered sets/lists incrementally.

> Support Set containers
> ----------------------
>
>                 Key: AVRO-435
>                 URL: https://issues.apache.org/jira/browse/AVRO-435
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Cassandra uses Set as a return type for some methods.  It would be nice to not have to
use a List as a workaround.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message