Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5320B174FD for ; Thu, 12 Feb 2015 02:12:13 +0000 (UTC) Received: (qmail 96102 invoked by uid 500); 12 Feb 2015 02:12:13 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 96032 invoked by uid 500); 12 Feb 2015 02:12:13 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 96020 invoked by uid 99); 12 Feb 2015 02:12:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Feb 2015 02:12:13 +0000 Date: Thu, 12 Feb 2015 02:12:13 +0000 (UTC) From: "Ryan Blue (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AVRO-680) Allow for non-string keys MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317431#comment-14317431 ] Ryan Blue commented on AVRO-680: -------------------------------- Overall, this patch is looking really good. I flagged a few things: * In {{getNameForNonStringMapRecord}}, an {{UnsupportedEncodingException}} is wrapped in {{RuntimeException}} with no reported error. I think it should be {{AvroRuntimeException}} with a sensible error message explaining what action failed. * For name generation, what about using easier to read names? As long as you don't expect to find an actual class named {{org.apache.avro.reflect.Pair776ea00e586e8427}}, then the name can be anything fairly unique. I'd prefer a simpler namespace since it won't actually find a class, like "pairs", and it would be great to generate the name from the key and value types. At least for primitives, this would be a lot more readable: "IntBooleanPair", "LongStringPair", etc. * This doesn't seem to produce the array-of-pairs schema when I call getSchema. All map schemas are producing this: {"type":"record","name":"HashMap","namespace":"java.util","fields":[]}. It does work when I call it on Company.class, so I think it might be a bug. * Is it possible to use the normal writeArray logic? It looks like it would be easier to change {{write(schema,datum,encoder)}} so that a non-string map replaces datum with its entry set, then that set is written as a collection and each {{Map.Entry}} is passed to {{write(schema,datum,encoder}} individually. That would eliminate the odd control flow in write and match how such maps are handled elsewhere with the addition of {{getArrayAsCollection}}. > Allow for non-string keys > ------------------------- > > Key: AVRO-680 > URL: https://issues.apache.org/jira/browse/AVRO-680 > Project: Avro > Issue Type: Improvement > Affects Versions: 1.7.6, 1.7.7 > Reporter: Jeremy Hanna > Attachments: AVRO-680.patch, AVRO-680.patch, PERF_8000_cycles.zip, isMap_Call_Hierarchy.png, non_string_map_keys.zip, non_string_map_keys2.zip, non_string_map_keys3.zip, non_string_map_keys4.patch, non_string_map_keys5.patch, non_string_map_keys6.patch, non_string_map_keys7.patch, non_string_map_perf.txt, non_string_map_perf2.txt, original_perf.txt > > > Based on an email thread back in April, Doug Cutting proposed a possible solution for having non-string keys: > Stu Hood wrote: > > I can understand the reasoning behind AVRO-9, but now I need to look for an alternative to a 'map' that will allow me to store an association of bytes keys to values. > A map of Foo has the same binary format as an array of records, each > with a string field and a Foo field. So an application can use an array > schema similar to this to represent map-like structures with, e.g., > non-string keys. > Perhaps we could establish standard properties that indicate that a > given array of records should be represented in a map-like way if > possible? E.g.,: > {"type": "array", "isMap": true, "items": {"type":"record", ...}} > Doug -- This message was sent by Atlassian JIRA (v6.3.4#6332)