avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-680) Allow for non-string keys
Date Thu, 12 Feb 2015 02:12:13 GMT

    [ https://issues.apache.org/jira/browse/AVRO-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317431#comment-14317431

Ryan Blue commented on AVRO-680:

Overall, this patch is looking really good. I flagged a few things:

* In {{getNameForNonStringMapRecord}}, an {{UnsupportedEncodingException}} is wrapped in {{RuntimeException}}
with no reported error. I think it should be {{AvroRuntimeException}} with a sensible error
message explaining what action failed.
* For name generation, what about using easier to read names? As long as you don't expect
to find an actual class named {{org.apache.avro.reflect.Pair776ea00e586e8427}}, then the name
can be anything fairly unique. I'd prefer a simpler namespace since it won't actually find
a class, like "pairs", and it would be great to generate the name from the key and value types.
At least for primitives, this would be a lot more readable: "IntBooleanPair", "LongStringPair",
* This doesn't seem to produce the array-of-pairs schema when I call getSchema. All map schemas
are producing this: {"type":"record","name":"HashMap","namespace":"java.util","fields":[]}.
It does work when I call it on Company.class, so I think it might be a bug.
* Is it possible to use the normal writeArray logic? It looks like it would be easier to change
{{write(schema,datum,encoder)}} so that a non-string map replaces datum with its entry set,
then that set is written as a collection and each {{Map.Entry}} is passed to {{write(schema,datum,encoder}}
individually. That would eliminate the odd control flow in write and match how such maps are
handled elsewhere with the addition of {{getArrayAsCollection}}.

> Allow for non-string keys
> -------------------------
>                 Key: AVRO-680
>                 URL: https://issues.apache.org/jira/browse/AVRO-680
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.7.6, 1.7.7
>            Reporter: Jeremy Hanna
>         Attachments: AVRO-680.patch, AVRO-680.patch, PERF_8000_cycles.zip, isMap_Call_Hierarchy.png,
non_string_map_keys.zip, non_string_map_keys2.zip, non_string_map_keys3.zip, non_string_map_keys4.patch,
non_string_map_keys5.patch, non_string_map_keys6.patch, non_string_map_keys7.patch, non_string_map_perf.txt,
non_string_map_perf2.txt, original_perf.txt
> Based on an email thread back in April, Doug Cutting proposed a possible solution for
having non-string keys:
> Stu Hood wrote:
> > I can understand the reasoning behind AVRO-9, but now I need to look for an alternative
to a 'map' that will allow me to store an association of bytes keys to values.
> A map of Foo has the same binary format as an array of records, each
> with a string field and a Foo field.  So an application can use an array
> schema similar to this to represent map-like structures with, e.g.,
> non-string keys.
> Perhaps we could establish standard properties that indicate that a
> given array of records should be represented in a map-like way if
> possible?  E.g.,:
> {"type": "array", "isMap": true, "items": {"type":"record", ...}}
> Doug

This message was sent by Atlassian JIRA

View raw message