avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Warrington (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-712) define memcmp'able encoding
Date Thu, 23 Dec 2010 19:15:45 GMT

    [ https://issues.apache.org/jira/browse/AVRO-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974700#action_12974700
] 

Adam Warrington commented on AVRO-712:
--------------------------------------

Scott Carey:

I think some usefulness can come from the ability to use Avro entities with systems that use
memcmp to sort binary data. For example, keys in an HBase table. One could create multi-component
keys for an HBase table using Avro, and have guarantees about how their data is to be sorted.
Say I'm storing blog posts in HBase and want to group blog posts over time by domain within
a table. I could create an avro schema:

{ "type": "record",
  "name": "author_blog_key",
  "fields": [
    { "name": "domain", "type": "string" },
    { "name": "timestamp": "type": "long" }
  ]}

If the memcmp sort ordering could be guaranteed, I can used serialized instances of this within
systems that deal with sorted data using memcmp.

It's clear that the time/space costs are going to be negatively impacted with this type of
encoding, especially dealing with bytes/strings/arrays. Your proposed encoding of Ints and
Longs is clever, and I like the idea of putting ignored fields at the end of a record if equivalency
isn't required (which in many cases it isn't).


> define memcmp'able encoding
> ---------------------------
>
>                 Key: AVRO-712
>                 URL: https://issues.apache.org/jira/browse/AVRO-712
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Doug Cutting
>         Attachments: memcmp_encoding_prototype.py
>
>
> It would be useful to have an encoding for Avro data that ordered data according to the
Avro specification under memcmp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message