<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>user@avro.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/avro-user/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/avro-user/"/>
<id>http://mail-archives.apache.org/mod_mbox/avro-user/</id>
<updated>2013-05-24T06:25:56Z</updated>
<entry>
<title>Re: Is Avro right for me?</title>
<author><name>Sean Busbey &lt;busbey@cloudera.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAGHyZ6LQRtucasJJtMxea7xmsq_mGb-xBLEkTVSk7vLR=9bSwA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAGHyZ6LQRtucasJJtMxea7xmsq_mGb-xBLEkTVSk7vLR=9bSwA@mail-gmail-com%3e</id>
<updated>2013-05-24T04:16:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Yep. Avro would be great at that (provided your central consumer is Avro&#010;friendly, like a Hadoop system).  Make sure that all of your schemas have&#010;default values defined for fields so that schema evolution will be easier&#010;in the future.&#010;&#010;&#010;On Thu, May 23, 2013 at 4:29 PM, Mark &lt;static.void.dev@gmail.com&gt; wrote:&#010;&#010;&gt; We're thinking about generating logs and events with Avro and shipping&#010;&gt; them to a central collector service via Flume. Is this a valid use case?&#010;&gt;&#010;&gt;&#010;&#010;&#010;-- &#010;Sean&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Ruby RPC</title>
<author><name>Mark &lt;static.void.dev@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c3C18DD8B-8FB1-4203-B8A9-B805DA9814D6@gmail.com%3e"/>
<id>urn:uuid:%3c3C18DD8B-8FB1-4203-B8A9-B805DA9814D6@gmail-com%3e</id>
<updated>2013-05-23T20:33:45Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Is anyone aware or knows why that the Ruby RPC example client doesn't work with 1.8.7 (Haven't&#010;tried any other versions)?  - https://github.com/phunt/avro-rpc-quickstart/tree/master/src/main/ruby&#010;&#010;I keep receiving the following error: &#010;&#010;read buffer_length': Socket read 0 bytes. (Avro::IPC::ConnectionClosedException)&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Is Avro right for me?</title>
<author><name>Mark &lt;static.void.dev@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c5E1C8602-0CFE-4E42-A4BF-FF9A009AAA7D@gmail.com%3e"/>
<id>urn:uuid:%3c5E1C8602-0CFE-4E42-A4BF-FF9A009AAA7D@gmail-com%3e</id>
<updated>2013-05-23T20:29:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
We're thinking about generating logs and events with Avro and shipping them to a central collector&#010;service via Flume. Is this a valid use case?&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: using Avro unions with HIVE</title>
<author><name>Mark Wagner &lt;wagner.mark.d@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAN6RUqX5L34xTf=KLnyPW4Hn=qSZuWgLg9m5j0xTQ_=-GzMNGQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAN6RUqX5L34xTf=KLnyPW4Hn=qSZuWgLg9m5j0xTQ_=-GzMNGQ@mail-gmail-com%3e</id>
<updated>2013-05-23T20:08:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Ran,&#010;&#010;Unfortunately, there's no real way to manipulate unions in Hive. The Avro&#010;SerDe translates Avro unions into Hive unions correctly, but the support&#010;for accessing those fields is not there. The exception to this is the&#010;[null, T] pattern for nullable fields, which is handled by the Avro SerDe&#010;transparently. This JIRA is tracking imporved union support for Hive, but&#010;it's not being actively worked on:&#010;https://issues.apache.org/jira/browse/HIVE-2390.&#010;&#010;Thanks,&#010;Mark&#010;&#010;&#010;On Thu, May 23, 2013 at 11:45 AM, Scott Carey &lt;scottcarey@apache.org&gt; wrote:&#010;&#010;&gt; The Hive mailing list would have more info on the Avro SerDe usage.&#010;&gt;&#010;&gt; In general, a system that does not have union types like Hive (or Pig,&#010;&gt; etc) has to expand a union into multiple fields if there are more than one&#010;&gt; non-null type -- and at most one branch of the union is not null.&#010;&gt;&#010;&gt; For example a record with fields:&#010;&gt;&#010;&gt;   {"name":"timestamp", "type":"long", "default":-1}&#010;&gt;   {"name":"ipAddress", "type":["IPv4", "IPv6"]}&#010;&gt;&#010;&gt; where IPv4 and IPv6 are previously defined types, would have to expand to&#010;&gt; three fields&#010;&gt;  "timestamp", "ipAddress:IPv4", and "ipAddress:IPv6", where only one of&#010;&gt; the last two is not null in any given record.&#010;&gt;&#010;&gt; I do not know what Hive's Avro SerDe does with unions.&#010;&gt;&#010;&gt; On 5/23/13 7:15 AM, "Ran S" &lt;rans@liveperson.com&gt; wrote:&#010;&gt;&#010;&gt; &gt;Hi,&#010;&gt; &gt;We started to work with Avro in CDH4 and to query the Avro files using&#010;&gt; &gt;Hive.&#010;&gt; &gt;This does work fine for us, except for unions.&#010;&gt; &gt;We do not understand how to query the data inside a union using Hive.&#010;&gt; &gt;&#010;&gt; &gt;For example, let's look at the following schema:&#010;&gt; &gt;&#010;&gt; &gt;{&#010;&gt; &gt;       "type":"record",&#010;&gt; &gt;       "name":"event",&#010;&gt; &gt;       "namespace":"com.mysite",&#010;&gt; &gt;       "fields":[&#010;&gt; &gt;    {&#010;&gt; &gt;        "name":"header",&#010;&gt; &gt;        "type":{&#010;&gt; &gt;            "type":"record", "name":"CommonHeader",&#010;&gt; &gt;            "fields":[{ "name":"eventTimeStamp", "type":"long", efault":-1&#010;&gt; &gt;},&#010;&gt; &gt;                      { "name":"globalUserId", "type":["null", "string"],&#010;&gt; &gt;"default":null } ]&#010;&gt; &gt;        },&#010;&gt; &gt;        "default":null&#010;&gt; &gt;    },&#010;&gt; &gt;    {&#010;&gt; &gt;        "name":"eventbody",&#010;&gt; &gt;        "type":{&#010;&gt; &gt;            "type":"record", "name":"eventbody",&#010;&gt; &gt;            "fields":[&#010;&gt; &gt;                {&#010;&gt; &gt;                    "name":"body",&#010;&gt; &gt;                    "type":[&#010;&gt; &gt;                       "null",&#010;&gt; &gt;                       {&#010;&gt; &gt;                        "type":"record",&#010;&gt; &gt;                        "name":"event1",&#010;&gt; &gt;                        "fields":[&#010;&gt; &gt;                            {&#010;&gt; &gt;                                "name":"event1Header",&#010;&gt; &gt;                                "type":["null", { "type":"array",&#010;&gt; &gt;"items":"string" }], "default":null&#010;&gt; &gt;                            },&#010;&gt; &gt;                            {&#010;&gt; &gt;                                "name":"event1Body",&#010;&gt; &gt;                                "type":["null", { "type":"array",&#010;&gt; &gt;"items":"string" }], "default":null&#010;&gt; &gt;                            }&#010;&gt; &gt;                        ]&#010;&gt; &gt;                    },&#010;&gt; &gt;                   {&#010;&gt; &gt;                        "type":"record",&#010;&gt; &gt;                        "name":"event2",&#010;&gt; &gt;                        "fields":[&#010;&gt; &gt;                            {&#010;&gt; &gt;                                "name":"page",&#010;&gt; &gt;                                "type":{&#010;&gt; &gt;                                    "type":"record", "name":"URL",&#010;&gt; &gt;"fields":[{ "name":"url", "type":"string" }]&#010;&gt; &gt;                                },&#010;&gt; &gt;                                "default":null&#010;&gt; &gt;                            },&#010;&gt; &gt;                            {&#010;&gt; &gt;                                "name":"referrer", "type":"string",&#010;&gt; &gt;"default":null&#010;&gt; &gt;                            }&#010;&gt; &gt;                        ]&#010;&gt; &gt;                    }&#010;&gt; &gt;               ],&#010;&gt; &gt;                    "default":null&#010;&gt; &gt;                }&#010;&gt; &gt;            ]&#010;&gt; &gt;        },&#010;&gt; &gt;        "default":null&#010;&gt; &gt;    }&#010;&gt; &gt;]}&#010;&gt; &gt;&#010;&gt; &gt;Note that "body" is a union of three types:&#010;&gt; &gt;null, "event1" and "event2"&#010;&gt; &gt;&#010;&gt; &gt;So if I want to query fields inside event1, I first need to access it.&#010;&gt; &gt;I then set a HiveQL like this:&#010;&gt; &gt;SELECT eventbody.body.??? from SRC&#010;&gt; &gt;&#010;&gt; &gt;My question is: what shoule I put in the ??? above to make this work?&#010;&gt; &gt;&#010;&gt; &gt;Thank you,&#010;&gt; &gt;Ran&#010;&gt; &gt;&#010;&gt; &gt;&#010;&gt; &gt;&#010;&gt; &gt;--&#010;&gt; &gt;View this message in context:&#010;&gt; &gt;&#010;&gt; http://apache-avro.679487.n3.nabble.com/using-Avro-unions-with-HIVE-tp4027&#010;&gt; &gt;473.html&#010;&gt; &gt;Sent from the Avro - Users mailing list archive at Nabble.com.&#010;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Newb question on imorting JSON and defaults</title>
<author><name>&quot;Gregory (Grisha) Trubetskoy&quot; &lt;grisha@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cPine.OSX.4.64-MacPine.1305231606350.46342@ourmini.ispol.com%3e"/>
<id>urn:uuid:%3cPine-OSX-4-64-MacPine-1305231606350-46342@ourmini-ispol-com%3e</id>
<updated>2013-05-23T20:07:26Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&#010;Thanks Scott!&#010;&#010;So it looks like "fromjson" is mainly meant for processing JSON generated &#010;by "tojson" and not as a general "JSON importing tool" (although it could &#010;be used as such) - it's probably my short attention span, but somehow that &#010;point got lost on me. (As I later learned it also seems that the schema &#010;that the fromjson expects is a simplified version - e.g. specifying a &#010;union will give an error.)&#010;&#010;So if I expect to be dealing with data coming in as JSON and would need to &#010;be converting it to Avro - the current "best practice" is to write a &#010;program of your own? This seems like a fairly common thing to do, perhaps &#010;if there isn't a general tool, this could be something useful to hack on &#010;for the Avro project...&#010;&#010;Grisha&#010;&#010;On Thu, 23 May 2013, Scott Carey wrote:&#010;&#010;&gt;&#010;&gt;&#010;&gt; On 5/22/13 2:26 PM, "Gregory (Grisha) Trubetskoy" &lt;grisha@apache.org&gt;&#010;&gt; wrote:&#010;&gt;&#010;&gt;&gt;&#010;&gt;&gt; Hello!&#010;&gt;&gt;&#010;&gt;&gt; I have a test.json file that looks like this:&#010;&gt;&gt;&#010;&gt;&gt; {"first":"John", "last":"Doe", "middle":"C"}&#010;&gt;&gt; {"first":"John", "last":"Doe"}&#010;&gt;&gt;&#010;&gt;&gt; (Second line does NOT have a "middle" element).&#010;&gt;&gt;&#010;&gt;&gt; And I have a test.schema file that looks like this:&#010;&gt;&gt;&#010;&gt;&gt; {"name":"test",&#010;&gt;&gt;  "type":"record",&#010;&gt;&gt;  "fields": [&#010;&gt;&gt;     {"name":"first",  "type":"string"},&#010;&gt;&gt;     {"name":"middle", "type":"string", "default":""},&#010;&gt;&gt;     {"name":"last",   "type":"string"}&#010;&gt;&gt; ]}&#010;&gt;&gt;&#010;&gt;&gt; I then try to use fromjson, as follows, and it chokes on the second line:&#010;&gt;&gt;&#010;&gt;&gt; $ java -jar avro-tools-1.7.4.jar fromjson --schema-file test.schema&#010;&gt;&gt; test.json &gt; test.avro&#010;&gt;&gt; Exception in thread "main" org.apache.avro.AvroTypeException: Expected&#010;&gt;&gt; field name not found: middle&#010;&gt;&gt;         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)&#010;&gt;&gt;         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)&#010;&gt;&gt;         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107&#010;&gt;&gt; )&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.j&#010;&gt;&gt; ava:348)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.j&#010;&gt;&gt; ava:341)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:15&#010;&gt;&gt; 4)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j&#010;&gt;&gt; ava:177)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14&#010;&gt;&gt; 8)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13&#010;&gt;&gt; 9)&#010;&gt;&gt;         at&#010;&gt;&gt; org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:105)&#010;&gt;&gt;         at org.apache.avro.tool.Main.run(Main.java:80)&#010;&gt;&gt;         at org.apache.avro.tool.Main.main(Main.java:69)&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; The short story is - I need to convert a bunch of JSON where an element&#010;&gt;&gt; may not be present sometimes, in which case I'd want it to default to&#010;&gt;&gt; something sensible, e.g. blank or null.&#010;&gt;&gt;&#010;&gt;&gt; According to the Schema Resolution "if the reader's record schema has a&#010;&gt;&gt; field that contains a default value, and writer's schema does not have a&#010;&gt;&gt; field with the same name, then the reader should use the default value&#010;&gt;&gt; from its field."&#010;&gt;&gt;&#010;&gt;&gt; I'm clearly missing something obvious, any help would be appreciated!&#010;&gt;&#010;&gt; There are two things that seem to be missing here:&#010;&gt; 1. The fromjson tool is configuring the "writer's schema" (and readers's)&#010;&gt; the one you provided.   Avro is expecting every&#010;&gt; JSON fragment you are giving it to have the same schema.&#010;&gt; 2. The tool will not work for all arbitrary json, it expects json in the&#010;&gt; format that the Avro JSON Encoder writes.  There are a few differences&#010;&gt; with expectations, primarily when disambiguating union types and maps from&#010;&gt; records.&#010;&gt;&#010;&gt; To perform schema evolution while reading, you may need to separate json&#010;&gt; fragments missing "middle" from those that have it, and run the tool&#010;&gt; twice, with corresponding schemas for each case.&#010;&gt; Alternatively the tool could be modified to handle schema resolution or&#010;&gt; deal with different json encodings as&#010;&gt; well(tools/src/main/java/org/apache/avro/tool/DataFileWriteTool).&#010;&gt;&#010;&gt; Alternatively, you can avoid schema resolution and write two files, one&#010;&gt; with data in each schema after separating the records.   Then you can deal&#010;&gt; with schema resolution in a later pass through the data with other tools&#010;&gt; (e.g. data file reader + writer), or only lazily&#010;&gt; when reading resolve the data into the schema you wish.&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;&gt;&#010;&gt;&gt; Grisha&#010;&gt;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Newb question on imorting JSON and defaults</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDC3B9C5.F29D1%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDC3B9C5-F29D1%25scott@richrelevance-com%3e</id>
<updated>2013-05-23T19:24:06Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&#010;&#010;On 5/22/13 2:26 PM, "Gregory (Grisha) Trubetskoy" &lt;grisha@apache.org&gt;&#010;wrote:&#010;&#010;&gt;&#010;&gt;Hello!&#010;&gt;&#010;&gt;I have a test.json file that looks like this:&#010;&gt;&#010;&gt;{"first":"John", "last":"Doe", "middle":"C"}&#010;&gt;{"first":"John", "last":"Doe"}&#010;&gt;&#010;&gt;(Second line does NOT have a "middle" element).&#010;&gt;&#010;&gt;And I have a test.schema file that looks like this:&#010;&gt;&#010;&gt;{"name":"test",&#010;&gt;  "type":"record",&#010;&gt;  "fields": [&#010;&gt;     {"name":"first",  "type":"string"},&#010;&gt;     {"name":"middle", "type":"string", "default":""},&#010;&gt;     {"name":"last",   "type":"string"}&#010;&gt;]}&#010;&gt;&#010;&gt;I then try to use fromjson, as follows, and it chokes on the second line:&#010;&gt;&#010;&gt;$ java -jar avro-tools-1.7.4.jar fromjson --schema-file test.schema&#010;&gt;test.json &gt; test.avro&#010;&gt;Exception in thread "main" org.apache.avro.AvroTypeException: Expected&#010;&gt;field name not found: middle&#010;&gt;         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)&#010;&gt;         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)&#010;&gt;         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)&#010;&gt;         at &#010;&gt;org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)&#010;&gt;         at &#010;&gt;org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)&#010;&gt;         at &#010;&gt;org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107&#010;&gt;)&#010;&gt;         at &#010;&gt;org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.j&#010;&gt;ava:348)&#010;&gt;         at &#010;&gt;org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.j&#010;&gt;ava:341)&#010;&gt;         at &#010;&gt;org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:15&#010;&gt;4)&#010;&gt;         at &#010;&gt;org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j&#010;&gt;ava:177)&#010;&gt;         at &#010;&gt;org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14&#010;&gt;8)&#010;&gt;         at &#010;&gt;org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13&#010;&gt;9)&#010;&gt;         at &#010;&gt;org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:105)&#010;&gt;         at org.apache.avro.tool.Main.run(Main.java:80)&#010;&gt;         at org.apache.avro.tool.Main.main(Main.java:69)&#010;&gt;&#010;&gt;&#010;&gt;The short story is - I need to convert a bunch of JSON where an element&#010;&gt;may not be present sometimes, in which case I'd want it to default to&#010;&gt;something sensible, e.g. blank or null.&#010;&gt;&#010;&gt;According to the Schema Resolution "if the reader's record schema has a&#010;&gt;field that contains a default value, and writer's schema does not have a&#010;&gt;field with the same name, then the reader should use the default value&#010;&gt;from its field."&#010;&gt;&#010;&gt;I'm clearly missing something obvious, any help would be appreciated!&#010;&#010;There are two things that seem to be missing here:&#010; 1. The fromjson tool is configuring the "writer's schema" (and readers's)&#010;the one you provided.   Avro is expecting every&#010;JSON fragment you are giving it to have the same schema.&#010; 2. The tool will not work for all arbitrary json, it expects json in the&#010;format that the Avro JSON Encoder writes.  There are a few differences&#010;with expectations, primarily when disambiguating union types and maps from&#010;records.&#010;&#010;To perform schema evolution while reading, you may need to separate json&#010;fragments missing "middle" from those that have it, and run the tool&#010;twice, with corresponding schemas for each case.&#010;Alternatively the tool could be modified to handle schema resolution or&#010;deal with different json encodings as&#010;well(tools/src/main/java/org/apache/avro/tool/DataFileWriteTool).&#010;&#010;Alternatively, you can avoid schema resolution and write two files, one&#010;with data in each schema after separating the records.   Then you can deal&#010;with schema resolution in a later pass through the data with other tools&#010;(e.g. data file reader + writer), or only lazily&#010;when reading resolve the data into the schema you wish.&#010;&#010;&#010;&#010;&gt;&#010;&gt;Grisha&#010;&gt;&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: ETL in face of column renames</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDC3B7BA.F2994%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDC3B7BA-F2994%25scott@richrelevance-com%3e</id>
<updated>2013-05-23T19:16:38Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&#010;&#010;On 5/22/13 10:34 AM, "Mason" &lt;mason@verbasoftware.com&gt; wrote:&#010;&#010;&gt;dear list,&#010;&gt;&#010;&gt;I have what I imagine is a standard setup: a web application generates&#010;&gt;data in MySQL, which I want to analyze in Hadoop; I run a nightly&#010;&gt;process to extract tables of interest, Avroize, and dump into HDFS.&#010;&gt;&#010;&gt;This has worked great so far because the tools I'm using make it easy to&#010;&gt;load a directory tree of Avros with the same schema.&#010;&gt;&#010;&gt;The issue is what to do when schema changes occur in the SQL database. I&#010;&gt;believe column additions and deletions are handled automatically by the&#010;&gt;Avro loaders I'm using, but I need to deal with a column rename.&#010;&gt;&#010;&gt;My thinking is: I could bake the table schemas at time of ETL into the&#010;&gt;Avros, for historical record, but then manually copy that schema out as&#010;&gt;a "master" schema and apply it to all Avros for which it's appropriate;&#010;&gt;then when a column rename occurs, go back and edit the master schema.&#010;&gt;&#010;&gt;I've never used an external schema before, so please correct if I&#010;&gt;misunderstand how they work.&#010;&gt;&#010;&gt;Anyone have wisdom to share on this topic? I'd love to hear from anyone&#010;&gt;who has done this, or has a better solution.&#010;&#010;The first thing that comes to mind is the alias feature for field names:&#010;http://avro.apache.org/docs/current/spec.html#Aliases&#010;&#010;If you bare using Avro data files, these contain the schemas at the time&#010;of writing for "historical record".&#010;&#010;The trick is being able to distinguish between someone who renamed a&#010;column from "foo" to "fubar" and a case where "foo" was removed and&#010;"foobar" added.  To do this, one has to have knowledge from the SQL&#010;database DDL changes.&#010;&#010;Once you have this, you can choose your reader schema appropriately --&#010;likely by using the 'latest' schema decorated with field aliases where&#010;appropriate, but there are other options.&#010;&#010;&#010;&gt;&#010;&gt;-Mason&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: why isnt there an option for a default enum so that the writer can add new enum symbols</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDC3B239.F28F1%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDC3B239-F28F1%25scott@richrelevance-com%3e</id>
<updated>2013-05-23T18:49:55Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I think this is simply a feature that has not been added.  It appears at&#010;quick glance to be a compatible specification change (it does not break&#010;old code or change the binary format).&#010;&#010;Please open a JIRA ticket explaining the use case and we can continue&#010;discussion there.&#010;&#010;On 5/23/13 7:51 AM, "Jim Donofrio" &lt;donofrio111@gmail.com&gt; wrote:&#010;&#010;&gt;The schema resolution page says:&#010;&gt;&#010;&gt; &gt; if both are enums:&#010;&gt; &gt; if the writer's symbol is not present in the reader's enum, then an&#010;&gt;error is signalled.&#010;&gt;&#010;&gt;Is there a reason you could not allow us to provide one of the symbols&#010;&gt;as a default in the reader so that when the reader read the enum with&#010;&gt;symbol it didnt have, that new symbol would get defaulted to the default&#010;&gt;symbol we provide?&#010;&gt;&#010;&gt;For example with the below schema's this would currently fail if the old&#010;&gt;reader encountered a Enum1.C in the data. Why not provide users the&#010;&gt;option to set "default": "A" for example so that any unknown enum's are&#010;&gt;bined into one? The user might have an unknown enum or something.&#010;&gt;Currently I was going to implement this in my application by just using&#010;&gt;an int type and looking up the ordinal of my own Java Enum. If the int&#010;&gt;is larger than any of the oridinals, the enum would be set to my own&#010;&gt;default value in the enum.&#010;&gt;&#010;&gt;Thanks.&#010;&gt;&#010;&gt;Original schema:&#010;&gt;&#010;&gt;{&#010;&gt;     "type": "record",&#010;&gt;     "name": "EnumExample",&#010;&gt;     "fields": [&#010;&gt;         {&#010;&gt;             "name": "enum1",&#010;&gt;             "type": {&#010;&gt;                 "type": "enum",&#010;&gt;                 "name": "Enum1",&#010;&gt;                 "symbols": ["A", "B"]&#010;&gt;             }&#010;&gt;         }&#010;&gt;     ]&#010;&gt;}&#010;&gt;&#010;&gt;New schema:&#010;&gt;&#010;&gt;{&#010;&gt;     "type": "record",&#010;&gt;     "name": "EnumExample",&#010;&gt;     "fields": [&#010;&gt;         {&#010;&gt;             "name": "enum1",&#010;&gt;             "type": {&#010;&gt;                 "type": "enum",&#010;&gt;                 "name": "Enum1",&#010;&gt;                 "symbols": ["A", "B", "C"]&#010;&gt;             }&#010;&gt;         }&#010;&gt;     ]&#010;&gt;}&#010;&gt;&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: using Avro unions with HIVE</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDC3B02C.F28B1%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDC3B02C-F28B1%25scott@richrelevance-com%3e</id>
<updated>2013-05-23T18:45:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
The Hive mailing list would have more info on the Avro SerDe usage.&#010;&#010;In general, a system that does not have union types like Hive (or Pig,&#010;etc) has to expand a union into multiple fields if there are more than one&#010;non-null type -- and at most one branch of the union is not null.&#010;&#010;For example a record with fields:&#010;&#010;  {"name":"timestamp", "type":"long", "default":-1}&#010;  {"name":"ipAddress", "type":["IPv4", "IPv6"]}&#010;&#010;where IPv4 and IPv6 are previously defined types, would have to expand to&#010;three fields&#010; "timestamp", "ipAddress:IPv4", and "ipAddress:IPv6", where only one of&#010;the last two is not null in any given record.&#010;&#010;I do not know what Hive's Avro SerDe does with unions.&#010;&#010;On 5/23/13 7:15 AM, "Ran S" &lt;rans@liveperson.com&gt; wrote:&#010;&#010;&gt;Hi,&#010;&gt;We started to work with Avro in CDH4 and to query the Avro files using&#010;&gt;Hive.&#010;&gt;This does work fine for us, except for unions.&#010;&gt;We do not understand how to query the data inside a union using Hive.&#010;&gt;&#010;&gt;For example, let's look at the following schema:&#010;&gt;&#010;&gt;{&#010;&gt;&#009;"type":"record", &#010;&gt;&#009;"name":"event", &#010;&gt;&#009;"namespace":"com.mysite",&#010;&gt;&#009;"fields":[&#010;&gt;    {&#010;&gt;        "name":"header",&#010;&gt;        "type":{&#010;&gt;            "type":"record", "name":"CommonHeader",&#010;&gt;            "fields":[{ "name":"eventTimeStamp", "type":"long", efault":-1&#010;&gt;},&#010;&gt;                      { "name":"globalUserId", "type":["null", "string"],&#010;&gt;"default":null } ]&#010;&gt;        },&#010;&gt;        "default":null&#010;&gt;    },&#010;&gt;    {&#010;&gt;        "name":"eventbody",&#010;&gt;        "type":{&#010;&gt;            "type":"record", "name":"eventbody",&#010;&gt;            "fields":[&#010;&gt;                {&#010;&gt;                    "name":"body",&#010;&gt;                    "type":[&#010;&gt;                       "null",&#010;&gt;                       {&#010;&gt;                        "type":"record",&#010;&gt;                        "name":"event1",&#010;&gt;                        "fields":[&#010;&gt;                            {&#010;&gt;                                "name":"event1Header",&#010;&gt;                                "type":["null", { "type":"array",&#010;&gt;"items":"string" }], "default":null&#010;&gt;                            },&#010;&gt;                            {&#010;&gt;                                "name":"event1Body",&#010;&gt;                                "type":["null", { "type":"array",&#010;&gt;"items":"string" }], "default":null&#010;&gt;                            }&#010;&gt;                        ]&#010;&gt;                    },&#010;&gt;                   {&#010;&gt;                        "type":"record",&#010;&gt;                        "name":"event2",&#010;&gt;                        "fields":[&#010;&gt;                            {&#010;&gt;                                "name":"page",&#010;&gt;                                "type":{&#010;&gt;                                    "type":"record", "name":"URL",&#010;&gt;"fields":[{ "name":"url", "type":"string" }]&#010;&gt;                                },&#010;&gt;                                "default":null&#010;&gt;                            },&#010;&gt;                            {&#010;&gt;                                "name":"referrer", "type":"string",&#010;&gt;"default":null&#010;&gt;                            }&#010;&gt;                        ]&#010;&gt;                    }&#010;&gt;&#009;&#009;],&#010;&gt;                    "default":null&#010;&gt;                }&#010;&gt;            ]&#010;&gt;        },&#010;&gt;        "default":null&#010;&gt;    }&#010;&gt;]}&#010;&gt;&#010;&gt;Note that "body" is a union of three types:&#010;&gt;null, "event1" and "event2"&#010;&gt;&#010;&gt;So if I want to query fields inside event1, I first need to access it.&#010;&gt;I then set a HiveQL like this:&#010;&gt;SELECT eventbody.body.??? from SRC&#010;&gt;&#010;&gt;My question is: what shoule I put in the ??? above to make this work?&#010;&gt;&#010;&gt;Thank you,&#010;&gt;Ran&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;--&#010;&gt;View this message in context:&#010;&gt;http://apache-avro.679487.n3.nabble.com/using-Avro-unions-with-HIVE-tp4027&#010;&gt;473.html&#010;&gt;Sent from the Avro - Users mailing list archive at Nabble.com.&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Compressed Avro vs. compressed Sequence - unexpected results?</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDC3AE35.F2876%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDC3AE35-F2876%25scott@richrelevance-com%3e</id>
<updated>2013-05-23T18:38:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
For your avro files, double check that snappy is used (use avro-tools to&#010;peek at the metadata in the file, or simply view the head in a text&#010;editor, the compression codec used will be in the header).&#010;&#010;Snappy is very fast, most likely the time to read is dominated by&#010;deserialization.  Avro will be slower than a trivial deserializer (but&#010;more compact), but being many times slower is not expected.  I am not&#010;entirely sure how Hive's Avro serDe works -- it is possible there is a&#010;performance issue there.  If you were able to get a handful of stack&#010;traces (kill -3 or jstack) from the mapper tasks (or a profiler output),&#010;it would be very insightful.&#010;&#010;&#010;On 5/23/13 12:42 AM, "nir_zamir" &lt;nir.zamir@gmail.com&gt; wrote:&#010;&#010;&gt;Hi,&#010;&gt;&#010;&gt;We're examining the storage of our data in Snappy-compressed files. Since&#010;&gt;we&#010;&gt;want the data's structure to be self contained, we checked it with Avro&#010;&gt;and&#010;&gt;with Sequence (both are splittable, which should best utilize our&#010;&gt;cluster).&#010;&gt;&#010;&gt;We tested the performance on a 12GB data (CSV) file, and a 4-nodes cluster&#010;&gt;on production environment (very strong machines).&#010;&gt;&#010;&gt;Compression&#010;&gt;&#010;&gt;What we did here (for test simplicity) is create two Hive tables:&#010;&gt;Avro-based&#010;&gt;and Sequence-based. Then we enabled Snappy compression and INSERTed the&#010;&gt;data&#010;&gt;from the RAW table (consisting of the 12GB file).&#010;&gt;&#010;&gt;In terms of compression rate, Avro was better: 72% vs. 57%.&#010;&gt;In both cases there were 45 mappers, and CPU/Mem were very far from their&#010;&gt;limit on all machines.&#010;&gt;Since there was no reduce operator, this created 45 files.&#010;&gt;&#010;&gt;Compression time for Avro took longer: 1.75 minutes vs. 1.2 minutes for&#010;&gt;sequence files.&#010;&gt;&#010;&gt;Decompression&#010;&gt;&#010;&gt;What we did here was this Hive query:&#010;&gt;SELECT COUNT(1) FROM table-name;&#010;&gt;&#010;&gt;Here was the real difference: it took Avro about *75% longer* to perform&#010;&gt;this (3 minutes vs. 0.5 minute).&#010;&gt;This was very surprising since for our strong machines the I/O would be&#010;&gt;expected to be the bottleneck, and since Avro files are smaller,we&#010;&gt;expected&#010;&gt;them to be faster to decompress.&#010;&gt;The number of mappers in both cases was similar (14 vs. 17) and again,&#010;&gt;CPU/Mem didn't seem to be exausted.&#010;&gt;Since our most critical time is reading, this issue makes it hard for us&#010;&gt;to&#010;&gt;be using Avro.&#010;&gt;&#010;&gt;Maybe we're doing something wrong - your input would be much appreciated!&#010;&gt;&#010;&gt;Thanks,&#010;&gt;Nir&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;--&#010;&gt;View this message in context:&#010;&gt;http://apache-avro.679487.n3.nabble.com/Compressed-Avro-vs-compressed-Sequ&#010;&gt;ence-unexpected-results-tp4027467.html&#010;&gt;Sent from the Avro - Users mailing list archive at Nabble.com.&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>why isnt there an option for a default enum so that the writer can add new enum symbols</title>
<author><name>Jim Donofrio &lt;donofrio111@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c519E2CF0.8000300@gmail.com%3e"/>
<id>urn:uuid:%3c519E2CF0-8000300@gmail-com%3e</id>
<updated>2013-05-23T14:51:28Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
The schema resolution page says:&#010;&#010; &gt; if both are enums:&#010; &gt; if the writer's symbol is not present in the reader's enum, then an &#010;error is signalled.&#010;&#010;Is there a reason you could not allow us to provide one of the symbols &#010;as a default in the reader so that when the reader read the enum with &#010;symbol it didnt have, that new symbol would get defaulted to the default &#010;symbol we provide?&#010;&#010;For example with the below schema's this would currently fail if the old &#010;reader encountered a Enum1.C in the data. Why not provide users the &#010;option to set "default": "A" for example so that any unknown enum's are &#010;bined into one? The user might have an unknown enum or something. &#010;Currently I was going to implement this in my application by just using &#010;an int type and looking up the ordinal of my own Java Enum. If the int &#010;is larger than any of the oridinals, the enum would be set to my own &#010;default value in the enum.&#010;&#010;Thanks.&#010;&#010;Original schema:&#010;&#010;{&#010;     "type": "record",&#010;     "name": "EnumExample",&#010;     "fields": [&#010;         {&#010;             "name": "enum1",&#010;             "type": {&#010;                 "type": "enum",&#010;                 "name": "Enum1",&#010;                 "symbols": ["A", "B"]&#010;             }&#010;         }&#010;     ]&#010;}&#010;&#010;New schema:&#010;&#010;{&#010;     "type": "record",&#010;     "name": "EnumExample",&#010;     "fields": [&#010;         {&#010;             "name": "enum1",&#010;             "type": {&#010;                 "type": "enum",&#010;                 "name": "Enum1",&#010;                 "symbols": ["A", "B", "C"]&#010;             }&#010;         }&#010;     ]&#010;}&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>using Avro unions with HIVE</title>
<author><name>Ran S &lt;rans@liveperson.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c1369318523625-4027473.post@n3.nabble.com%3e"/>
<id>urn:uuid:%3c1369318523625-4027473-post@n3-nabble-com%3e</id>
<updated>2013-05-23T14:15:23Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;We started to work with Avro in CDH4 and to query the Avro files using Hive.&#010;This does work fine for us, except for unions.&#010;We do not understand how to query the data inside a union using Hive.&#010;&#010;For example, let's look at the following schema:&#010;&#010;{&#010;&#009;"type":"record", &#010;&#009;"name":"event", &#010;&#009;"namespace":"com.mysite",&#010;&#009;"fields":[&#010;    {&#010;        "name":"header",&#010;        "type":{&#010;            "type":"record", "name":"CommonHeader",&#010;            "fields":[{ "name":"eventTimeStamp", "type":"long", efault":-1&#010;},&#010;                      { "name":"globalUserId", "type":["null", "string"],&#010;"default":null } ]&#010;        },&#010;        "default":null&#010;    },&#010;    {&#010;        "name":"eventbody",&#010;        "type":{&#010;            "type":"record", "name":"eventbody",&#010;            "fields":[&#010;                {&#010;                    "name":"body",&#010;                    "type":[&#010;                       "null", &#010;                       {&#010;                        "type":"record",&#010;                        "name":"event1",&#010;                        "fields":[&#010;                            {&#010;                                "name":"event1Header", &#010;                                "type":["null", { "type":"array",&#010;"items":"string" }], "default":null&#010;                            },&#010;                            {&#010;                                "name":"event1Body",&#010;                                "type":["null", { "type":"array",&#010;"items":"string" }], "default":null&#010;                            }&#010;                        ]&#010;                    }, &#010;                   {&#010;                        "type":"record",&#010;                        "name":"event2",&#010;                        "fields":[&#010;                            {&#010;                                "name":"page",&#010;                                "type":{&#010;                                    "type":"record", "name":"URL",&#010;"fields":[{ "name":"url", "type":"string" }]&#010;                                },&#010;                                "default":null&#010;                            },&#010;                            {&#010;                                "name":"referrer", "type":"string",&#010;"default":null&#010;                            }&#010;                        ]&#010;                    }&#010;&#009;&#009;],&#010;                    "default":null&#010;                }&#010;            ]&#010;        },&#010;        "default":null&#010;    }&#010;]}&#010;&#010;Note that "body" is a union of three types:&#010;null, "event1" and "event2"&#010;&#010;So if I want to query fields inside event1, I first need to access it.&#010;I then set a HiveQL like this:&#010;SELECT eventbody.body.??? from SRC&#010;&#010;My question is: what shoule I put in the ??? above to make this work?&#010;&#010;Thank you,&#010;Ran&#010;&#010;&#010;&#010;--&#010;View this message in context: http://apache-avro.679487.n3.nabble.com/using-Avro-unions-with-HIVE-tp4027473.html&#010;Sent from the Avro - Users mailing list archive at Nabble.com.&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Compressed Avro vs. compressed Sequence - unexpected results?</title>
<author><name>nir_zamir &lt;nir.zamir@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c1369294959978-4027467.post@n3.nabble.com%3e"/>
<id>urn:uuid:%3c1369294959978-4027467-post@n3-nabble-com%3e</id>
<updated>2013-05-23T07:42:39Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;&#010;We're examining the storage of our data in Snappy-compressed files. Since we&#010;want the data's structure to be self contained, we checked it with Avro and&#010;with Sequence (both are splittable, which should best utilize our cluster).&#010;&#010;We tested the performance on a 12GB data (CSV) file, and a 4-nodes cluster&#010;on production environment (very strong machines).&#010;&#010;Compression&#010;&#010;What we did here (for test simplicity) is create two Hive tables: Avro-based&#010;and Sequence-based. Then we enabled Snappy compression and INSERTed the data&#010;from the RAW table (consisting of the 12GB file).&#010;&#010;In terms of compression rate, Avro was better: 72% vs. 57%.&#010;In both cases there were 45 mappers, and CPU/Mem were very far from their&#010;limit on all machines.&#010;Since there was no reduce operator, this created 45 files.&#010;&#010;Compression time for Avro took longer: 1.75 minutes vs. 1.2 minutes for&#010;sequence files.&#010;&#010;Decompression&#010;&#010;What we did here was this Hive query:&#010;SELECT COUNT(1) FROM table-name;&#010;&#010;Here was the real difference: it took Avro about *75% longer* to perform&#010;this (3 minutes vs. 0.5 minute).&#010;This was very surprising since for our strong machines the I/O would be&#010;expected to be the bottleneck, and since Avro files are smaller,we expected&#010;them to be faster to decompress.&#010;The number of mappers in both cases was similar (14 vs. 17) and again,&#010;CPU/Mem didn't seem to be exausted.&#010;Since our most critical time is reading, this issue makes it hard for us to&#010;be using Avro.&#010;&#010;Maybe we're doing something wrong - your input would be much appreciated!&#010;&#010;Thanks,&#010;Nir&#010;&#010;&#010;&#010;--&#010;View this message in context: http://apache-avro.679487.n3.nabble.com/Compressed-Avro-vs-compressed-Sequence-unexpected-results-tp4027467.html&#010;Sent from the Avro - Users mailing list archive at Nabble.com.&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Avro RPC: Python to Java isn't working for me...</title>
<author><name>Stefan Krawczyk &lt;stefan@nextdoor.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAA5kp3ozpiB5y4FEAB-wdf-B-RPPQA3=d+wFQ8qVdRJ_RbVN4A@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAA5kp3ozpiB5y4FEAB-wdf-B-RPPQA3=d+wFQ8qVdRJ_RbVN4A@mail-gmail-com%3e</id>
<updated>2013-05-22T23:36:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Yes agreed, that would be a good approach to take, however before I go try&#010;to figure out how to write a NettyTransceiver in python, I'd like to get&#010;some community input on so that:&#010; - I understand why there wasn't one built in the first place, and thus&#010;answer is it actually possible to build that in python&#010; - I'm not duplicating effort if there is somebody already working on it&#010;&#010;So any NettyServer &amp; NettyTransceiver experts in the house?&#010;&#010;:)&#010;&#010;Cheers,&#010;&#010;Stefan&#010;&#010;&#010;&#010;On Wed, May 22, 2013 at 4:22 PM, Atin Sood &lt;soodatin@outlook.com&gt; wrote:&#010;&#010;&gt; Sure Stefan&#010;&gt;&#010;&gt; I am a newbie to both python and avro and have limited experience in&#010;&gt; networking.&#010;&gt;&#010;&gt; But if I am not wrong the problem is at python client end as I am guessing&#010;&gt; you must be using s.th like&#010;&gt;&#010;&gt;  client code - attach to the server and send a message&#010;&gt;&#010;&gt;     client = ipc.HTTPTransceiver(server_addr[0], server_addr[1])&#010;&gt;     requestor = ipc.Requestor(PROTOCOL, client)&#010;&gt;&#010;&gt;&#010;&gt; So even though you can go ahead and switch to NettyServer in java you will be limited&#010;to use httpserver because your python client uses http client&#010;&gt;&#010;&gt; https://github.com/phunt/avro-rpc-quickstart/blob/master/src/main/java/example/Main.java&#010;&gt;&#010;&gt; I guess the best way to get around this will be to look at source code of avro python&#010;client ipc.py file&#010;&gt;&#010;&gt; and add a new implementation besides the one that comes out of the box.&#010;&gt;&#010;&gt; class HTTPTransceiver(object):&#010;&gt;   """&#010;&gt;   A simple HTTP-based transceiver implementation.&#010;&gt;   Useful for clients but not for servers&#010;&gt;   """&#010;&gt;&#010;&gt;&#010;&gt; That said, again I am new to the whole thing so I might be totally wrong :)&#010;&gt;&#010;&gt;&#010;&gt; --&#010;&gt; Atin Sood&#010;&gt; Sent with Sparrow &lt;http://www.sparrowmailapp.com/?sig&gt;&#010;&gt;&#010;&gt; On Wednesday, May 22, 2013 at 11:55 AM, Stefan Krawczyk wrote:&#010;&gt;&#010;&gt; Hi Atin,&#010;&gt;&#010;&gt; Thanks for the response. Yes I understand I could use HTTPServer on the&#010;&gt; java side and things would work. However I'm after a solution where I can&#010;&gt; still have the java side use the NettyServer.&#010;&gt;&#010;&gt; Cheers,&#010;&gt;&#010;&gt; Stefan&#010;&gt;&#010;&gt;&#010;&gt; On Wed, May 22, 2013 at 4:11 AM, Atin Sood &lt;soodatin@outlook.com&gt; wrote:&#010;&gt;&#010;&gt; You can try looking into something that I wrote as an example&#010;&gt;&#010;&gt;&#010;&gt; https://github.com/atinsood/HESDataAnalyticsFinalProject/tree/master/javaXPython&#010;&gt;&#010;&gt; https://github.com/atinsood/HESDataAnalyticsFinalProject#javaxpython&#010;&gt;&#010;&gt; --&#010;&gt; Atin Sood&#010;&gt; Sent with Sparrow &lt;http://www.sparrowmailapp.com/?sig&gt;&#010;&gt;&#010;&gt; On Tuesday, May 21, 2013 at 11:18 PM, Stefan Krawczyk wrote:&#010;&gt;&#010;&gt; Hi,&#010;&gt;&#010;&gt; I am trying to use Avro RPC and have a python client talk to a java&#010;&gt; server, using the avro-rpc-quickstart&lt;https://github.com/phunt/avro-rpc-quickstart&gt;&#010;on&#010;&gt; github as a base (I made sure the avro version being pulled in was 1.7.4).&#010;&gt; However when I get my python client to talk to the java server I see this&#010;&gt; error:&#010;&gt;&#010;&gt; 2013-05-20 19:38:32,512 (pool-5-thread-2) [WARN -&#010;&gt; org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)]&#010;&gt; Unexpected exception from downstream.&#010;&gt; org.apache.avro.AvroRuntimeException: Excessively large list allocation&#010;&gt; request detected: 539959368 items! Connection closed.&#010;&gt; at&#010;&gt; org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)&#010;&gt;  at&#010;&gt; org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)&#010;&gt; at&#010;&gt; org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:286)&#010;&gt;  at&#010;&gt; org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:208)&#010;&gt; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)&#010;&gt;  at&#010;&gt; org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)&#010;&gt; at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)&#010;&gt;  at&#010;&gt; org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)&#010;&gt; at&#010;&gt; org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)&#010;&gt;  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)&#010;&gt; at&#010;&gt; java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&#010;&gt;  at&#010;&gt; java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&#010;&gt; at java.lang.Thread.run(Thread.java:722)&#010;&gt;&#010;&gt; From digging around on the web I understand this is a NettyTransceiver&#010;&gt; issue, i.e. the python client isn't using it because it uses the&#010;&gt; HTTPTransceiver.&#010;&gt;&#010;&gt; I was wondering, what are my options for moving forward, other than&#010;&gt; getting the java server to use the HTTPTransceiver?&#010;&gt;&#010;&gt; Apologies if I have overlooked something that points out what I can do.&#010;&gt;&#010;&gt; Cheers,&#010;&gt;&#010;&gt; Stefan&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Avro RPC: Python to Java isn't working for me...</title>
<author><name>Atin Sood &lt;soodatin@outlook.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cBLU0-SMTP21048D36638D789693485D7D9A90@phx.gbl%3e"/>
<id>urn:uuid:%3cBLU0-SMTP21048D36638D789693485D7D9A90@phx-gbl%3e</id>
<updated>2013-05-22T23:22:12Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Sure Stefan&#010;&#010;I am a newbie to both python and avro and have limited experience in networking. &#010;&#010;But if I am not wrong the problem is at python client end as I am guessing you must be using&#010;s.th like&#010;&#010;client code - attach to the server and send a message&#010;    client = ipc.HTTPTransceiver(server_addr[0], server_addr[1])&#010;    requestor = ipc.Requestor(PROTOCOL, client)&#010;&#010;&#010;So even though you can go ahead and switch to NettyServer in java you will be limited to use&#010;httpserver because your python client uses http client&#010;&#010;https://github.com/phunt/avro-rpc-quickstart/blob/master/src/main/java/example/Main.java&#010;&#010;I guess the best way to get around this will be to look at source code of avro python client&#010;ipc.py file&#010;&#010;and add a new implementation besides the one that comes out of the box.&#010;&#010;class HTTPTransceiver(object): """ A simple HTTP-based transceiver implementation. Useful&#010;for clients but not for servers """ &#010;&#010;&#010;That said, again I am new to the whole thing so I might be totally wrong :) &#010;&#010;-- &#010;Atin Sood&#010;Sent with Sparrow (http://www.sparrowmailapp.com/?sig)&#010;&#010;&#010;On Wednesday, May 22, 2013 at 11:55 AM, Stefan Krawczyk wrote:&#010;&#010;&gt; Hi Atin,&#010;&gt; &#010;&gt; Thanks for the response. Yes I understand I could use HTTPServer on the java side and&#010;things would work. However I'm after a solution where I can still have the java side use the&#010;NettyServer. &#010;&gt; &#010;&gt; Cheers,&#010;&gt; &#010;&gt; Stefan&#010;&gt; &#010;&gt; &#010;&gt; On Wed, May 22, 2013 at 4:11 AM, Atin Sood &lt;soodatin@outlook.com (mailto:soodatin@outlook.com)&gt;&#010;wrote:&#010;&gt; &gt; You can try looking into something that I wrote as an example&#010;&gt; &gt; &#010;&gt; &gt; https://github.com/atinsood/HESDataAnalyticsFinalProject/tree/master/javaXPython&#010;&#010;&gt; &gt; &#010;&gt; &gt; https://github.com/atinsood/HESDataAnalyticsFinalProject#javaxpython &#010;&gt; &gt; &#010;&gt; &gt; -- &#010;&gt; &gt; Atin Sood&#010;&gt; &gt; Sent with Sparrow (http://www.sparrowmailapp.com/?sig)&#010;&gt; &gt; &#010;&gt; &gt; &#010;&gt; &gt; On Tuesday, May 21, 2013 at 11:18 PM, Stefan Krawczyk wrote:&#010;&gt; &gt; &#010;&gt; &gt; &gt; Hi,&#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; I am trying to use Avro RPC and have a python client talk to a java server,&#010;using the avro-rpc-quickstart (https://github.com/phunt/avro-rpc-quickstart) on github as&#010;a base (I made sure the avro version being pulled in was 1.7.4). However when I get my python&#010;client to talk to the java server I see this error: &#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; 2013-05-20 19:38:32,512 (pool-5-thread-2) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)]&#010;Unexpected exception from downstream.&#010;&gt; &gt; &gt; org.apache.avro.AvroRuntimeException: Excessively large list allocation request&#010;detected: 539959368 items! Connection closed.&#010;&gt; &gt; &gt; at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)&#010;&gt; &gt; &gt; at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)&#010;&gt; &gt; &gt; at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:286)&#010;&gt; &gt; &gt; at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:208)&#010;&gt; &gt; &gt; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)&#010;&gt; &gt; &gt; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)&#010;&gt; &gt; &gt; at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)&#010;&gt; &gt; &gt; at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)&#010;&gt; &gt; &gt; at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)&#010;&gt; &gt; &gt; at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)&#010;&gt; &gt; &gt; at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&#010;&gt; &gt; &gt; at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&#010;&gt; &gt; &gt; at java.lang.Thread.run(Thread.java:722)&#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; From digging around on the web I understand this is a NettyTransceiver issue,&#010;i.e. the python client isn't using it because it uses the HTTPTransceiver. &#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; I was wondering, what are my options for moving forward, other than getting&#010;the java server to use the HTTPTransceiver?&#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; Apologies if I have overlooked something that points out what I can do.&#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; Cheers,&#010;&gt; &gt; &gt; &#010;&gt; &gt; &gt; Stefan &#010;&gt; &gt; &#010;&gt; &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Newb question on imorting JSON and defaults</title>
<author><name>&quot;Gregory (Grisha) Trubetskoy&quot; &lt;grisha@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cPine.OSX.4.64-MacPine.1305221715080.81909@ourmini.ispol.com%3e"/>
<id>urn:uuid:%3cPine-OSX-4-64-MacPine-1305221715080-81909@ourmini-ispol-com%3e</id>
<updated>2013-05-22T21:26:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&#010;Hello!&#010;&#010;I have a test.json file that looks like this:&#010;&#010;{"first":"John", "last":"Doe", "middle":"C"}&#010;{"first":"John", "last":"Doe"}&#010;&#010;(Second line does NOT have a "middle" element).&#010;&#010;And I have a test.schema file that looks like this:&#010;&#010;{"name":"test",&#010;  "type":"record",&#010;  "fields": [&#010;     {"name":"first",  "type":"string"},&#010;     {"name":"middle", "type":"string", "default":""},&#010;     {"name":"last",   "type":"string"}&#010;]}&#010;&#010;I then try to use fromjson, as follows, and it chokes on the second line:&#010;&#010;$ java -jar avro-tools-1.7.4.jar fromjson --schema-file test.schema test.json &gt; test.avro&#010;Exception in thread "main" org.apache.avro.AvroTypeException: Expected field name not found:&#010;middle&#010;         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)&#010;         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)&#010;         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)&#010;         at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)&#010;         at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)&#010;         at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)&#010;         at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)&#010;         at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)&#010;         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)&#010;         at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)&#010;         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)&#010;         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)&#010;         at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:105)&#010;         at org.apache.avro.tool.Main.run(Main.java:80)&#010;         at org.apache.avro.tool.Main.main(Main.java:69)&#010;&#010;&#010;The short story is - I need to convert a bunch of JSON where an element &#010;may not be present sometimes, in which case I'd want it to default to &#010;something sensible, e.g. blank or null.&#010;&#010;According to the Schema Resolution "if the reader's record schema has a &#010;field that contains a default value, and writer's schema does not have a &#010;field with the same name, then the reader should use the default value &#010;from its field."&#010;&#010;I'm clearly missing something obvious, any help would be appreciated!&#010;&#010;Grisha&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>ETL in face of column renames</title>
<author><name>Mason &lt;mason@verbasoftware.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c519D01A1.9010108@verbasoftware.com%3e"/>
<id>urn:uuid:%3c519D01A1-9010108@verbasoftware-com%3e</id>
<updated>2013-05-22T17:34:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
dear list,&#010;&#010;I have what I imagine is a standard setup: a web application generates &#010;data in MySQL, which I want to analyze in Hadoop; I run a nightly &#010;process to extract tables of interest, Avroize, and dump into HDFS.&#010;&#010;This has worked great so far because the tools I'm using make it easy to &#010;load a directory tree of Avros with the same schema.&#010;&#010;The issue is what to do when schema changes occur in the SQL database. I &#010;believe column additions and deletions are handled automatically by the &#010;Avro loaders I'm using, but I need to deal with a column rename.&#010;&#010;My thinking is: I could bake the table schemas at time of ETL into the &#010;Avros, for historical record, but then manually copy that schema out as &#010;a "master" schema and apply it to all Avros for which it's appropriate; &#010;then when a column rename occurs, go back and edit the master schema.&#010;&#010;I've never used an external schema before, so please correct if I &#010;misunderstand how they work.&#010;&#010;Anyone have wisdom to share on this topic? I'd love to hear from anyone &#010;who has done this, or has a better solution.&#010;&#010;-Mason&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Avro RPC: Python to Java isn't working for me...</title>
<author><name>Stefan Krawczyk &lt;stefan@nextdoor.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAA5kp3rsBfCVOXLDKM13pOcF0=-duGUO_e1uDwU9Kipkg2y7SA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAA5kp3rsBfCVOXLDKM13pOcF0=-duGUO_e1uDwU9Kipkg2y7SA@mail-gmail-com%3e</id>
<updated>2013-05-22T15:55:41Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Atin,&#010;&#010;Thanks for the response. Yes I understand I could use HTTPServer on the&#010;java side and things would work. However I'm after a solution where I can&#010;still have the java side use the NettyServer.&#010;&#010;Cheers,&#010;&#010;Stefan&#010;&#010;&#010;On Wed, May 22, 2013 at 4:11 AM, Atin Sood &lt;soodatin@outlook.com&gt; wrote:&#010;&#010;&gt; You can try looking into something that I wrote as an example&#010;&gt;&#010;&gt;&#010;&gt; https://github.com/atinsood/HESDataAnalyticsFinalProject/tree/master/javaXPython&#010;&gt;&#010;&gt; https://github.com/atinsood/HESDataAnalyticsFinalProject#javaxpython&#010;&gt;&#010;&gt; --&#010;&gt; Atin Sood&#010;&gt; Sent with Sparrow &lt;http://www.sparrowmailapp.com/?sig&gt;&#010;&gt;&#010;&gt; On Tuesday, May 21, 2013 at 11:18 PM, Stefan Krawczyk wrote:&#010;&gt;&#010;&gt; Hi,&#010;&gt;&#010;&gt; I am trying to use Avro RPC and have a python client talk to a java&#010;&gt; server, using the avro-rpc-quickstart&lt;https://github.com/phunt/avro-rpc-quickstart&gt;&#010;on&#010;&gt; github as a base (I made sure the avro version being pulled in was 1.7.4).&#010;&gt; However when I get my python client to talk to the java server I see this&#010;&gt; error:&#010;&gt;&#010;&gt; 2013-05-20 19:38:32,512 (pool-5-thread-2) [WARN -&#010;&gt; org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)]&#010;&gt; Unexpected exception from downstream.&#010;&gt; org.apache.avro.AvroRuntimeException: Excessively large list allocation&#010;&gt; request detected: 539959368 items! Connection closed.&#010;&gt; at&#010;&gt; org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)&#010;&gt;  at&#010;&gt; org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)&#010;&gt; at&#010;&gt; org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:286)&#010;&gt;  at&#010;&gt; org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:208)&#010;&gt; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)&#010;&gt;  at&#010;&gt; org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)&#010;&gt; at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)&#010;&gt;  at&#010;&gt; org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)&#010;&gt; at&#010;&gt; org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)&#010;&gt;  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)&#010;&gt; at&#010;&gt; java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&#010;&gt;  at&#010;&gt; java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&#010;&gt; at java.lang.Thread.run(Thread.java:722)&#010;&gt;&#010;&gt; From digging around on the web I understand this is a NettyTransceiver&#010;&gt; issue, i.e. the python client isn't using it because it uses the&#010;&gt; HTTPTransceiver.&#010;&gt;&#010;&gt; I was wondering, what are my options for moving forward, other than&#010;&gt; getting the java server to use the HTTPTransceiver?&#010;&gt;&#010;&gt; Apologies if I have overlooked something that points out what I can do.&#010;&gt;&#010;&gt; Cheers,&#010;&gt;&#010;&gt; Stefan&#010;&gt;&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Avro RPC: Python to Java isn't working for me...</title>
<author><name>Atin Sood &lt;soodatin@outlook.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cBLU0-SMTP20990A7C26F41E22DA78BA8D9A90@phx.gbl%3e"/>
<id>urn:uuid:%3cBLU0-SMTP20990A7C26F41E22DA78BA8D9A90@phx-gbl%3e</id>
<updated>2013-05-22T11:11:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You can try looking into something that I wrote as an example&#010;&#010;https://github.com/atinsood/HESDataAnalyticsFinalProject/tree/master/javaXPython&#010;&#010;https://github.com/atinsood/HESDataAnalyticsFinalProject#javaxpython &#010;&#010;-- &#010;Atin Sood&#010;Sent with Sparrow (http://www.sparrowmailapp.com/?sig)&#010;&#010;&#010;On Tuesday, May 21, 2013 at 11:18 PM, Stefan Krawczyk wrote:&#010;&#010;&gt; Hi,&#010;&gt; &#010;&gt; I am trying to use Avro RPC and have a python client talk to a java server, using the&#010;avro-rpc-quickstart (https://github.com/phunt/avro-rpc-quickstart) on github as a base (I&#010;made sure the avro version being pulled in was 1.7.4). However when I get my python client&#010;to talk to the java server I see this error: &#010;&gt; &#010;&gt; 2013-05-20 19:38:32,512 (pool-5-thread-2) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)]&#010;Unexpected exception from downstream.&#010;&gt; org.apache.avro.AvroRuntimeException: Excessively large list allocation request detected:&#010;539959368 items! Connection closed.&#010;&gt; at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)&#010;&gt; at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)&#010;&gt; at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:286)&#010;&gt; at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:208)&#010;&gt; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)&#010;&gt; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)&#010;&gt; at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)&#010;&gt; at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)&#010;&gt; at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)&#010;&gt; at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)&#010;&gt; at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&#010;&gt; at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&#010;&gt; at java.lang.Thread.run(Thread.java:722)&#010;&gt; &#010;&gt; From digging around on the web I understand this is a NettyTransceiver issue, i.e. the&#010;python client isn't using it because it uses the HTTPTransceiver. &#010;&gt; &#010;&gt; I was wondering, what are my options for moving forward, other than getting the java&#010;server to use the HTTPTransceiver?&#010;&gt; &#010;&gt; Apologies if I have overlooked something that points out what I can do.&#010;&gt; &#010;&gt; Cheers,&#010;&gt; &#010;&gt; Stefan &#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Avro RPC: Python to Java isn't working for me...</title>
<author><name>Stefan Krawczyk &lt;stefan@nextdoor.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAA5kp3og_t9=utiwevs689hcMhkF+oVA-0O+CGHuJ+j5YpZXSA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAA5kp3og_t9=utiwevs689hcMhkF+oVA-0O+CGHuJ+j5YpZXSA@mail-gmail-com%3e</id>
<updated>2013-05-22T03:18:10Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;&#010;I am trying to use Avro RPC and have a python client talk to a java server,&#010;using the avro-rpc-quickstart &lt;https://github.com/phunt/avro-rpc-quickstart&gt; on&#010;github as a base (I made sure the avro version being pulled in was 1.7.4).&#010;However when I get my python client to talk to the java server I see this&#010;error:&#010;&#010;2013-05-20 19:38:32,512 (pool-5-thread-2) [WARN -&#010;org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)]&#010;Unexpected exception from downstream.&#010;org.apache.avro.AvroRuntimeException: Excessively large list allocation&#010;request detected: 539959368 items! Connection closed.&#010;at&#010;org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)&#010; at&#010;org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)&#010;at&#010;org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:286)&#010; at&#010;org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:208)&#010;at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)&#010; at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)&#010;at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)&#010; at&#010;org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)&#010;at&#010;org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)&#010; at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)&#010;at&#010;java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&#010; at&#010;java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&#010;at java.lang.Thread.run(Thread.java:722)&#010;&#010;&gt;From digging around on the web I understand this is a NettyTransceiver&#010;issue, i.e. the python client isn't using it because it uses the&#010;HTTPTransceiver.&#010;&#010;I was wondering, what are my options for moving forward, other than getting&#010;the java server to use the HTTPTransceiver?&#010;&#010;Apologies if I have overlooked something that points out what I can do.&#010;&#010;Cheers,&#010;&#010;Stefan&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Transform a single .avdl or .avpr into one or many .avsc</title>
<author><name>Bertrand Dechoux &lt;dechouxb@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAO6W-2cEapaqeC4ebOMW8B8B3-GosOe9K0x=hOEzGWvv3Df0ug@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAO6W-2cEapaqeC4ebOMW8B8B3-GosOe9K0x=hOEzGWvv3Df0ug@mail-gmail-com%3e</id>
<updated>2013-05-21T21:44:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Waiting for a review : https://issues.apache.org/jira/browse/AVRO-1337&#010;&#010;Regards&#010;&#010;Bertrand&#010;&#010;&#010;&#010;On Tue, May 21, 2013 at 9:50 PM, Bertrand Dechoux &lt;dechouxb@gmail.com&gt;wrote:&#010;&#010;&gt; After browsing the (nice) API, it seems indeed trivial.&#010;&gt;&#010;&gt; The Idl class allows to read/parse the related file.&#010;&gt;&#010;&gt; http://avro.apache.org/docs/current/api/java/org/apache/avro/compiler/idl/Idl.html#Idl%28java.io.File%29&#010;&gt;&#010;&gt; The Protocol object can then be requested from it.&#010;&gt;&#010;&gt; http://avro.apache.org/docs/current/api/java/org/apache/avro/compiler/idl/Idl.html#ProtocolDeclaration%28%29&#010;&gt;&#010;&gt; Of course, the types can then be requested from the Protocol itself.&#010;&gt;&#010;&gt; http://avro.apache.org/docs/current/api/java/org/apache/avro/Protocol.html#getTypes%28%29&#010;&gt;&#010;&gt; And then it is only a matter of serializing them. And actually the API&#010;&gt; provides even a 'pretty' option.&#010;&gt;&#010;&gt; http://avro.apache.org/docs/current/api/java/org/apache/avro/Schema.html#toString%28boolean%29&#010;&gt;&#010;&gt; I will definitely look at it (ie with javac). Contributing it as a tool&#010;&gt; would be nice. But I won't go into the maven land.&#010;&gt;&#010;&gt; Regards&#010;&gt;&#010;&gt; Bertrand&#010;&gt;&#010;&gt;&#010;&gt; On Fri, May 17, 2013 at 9:11 PM, Doug Cutting &lt;cutting@apache.org&gt; wrote:&#010;&gt;&#010;&gt;&gt; There's not a tool that does this currently.  Note however that&#010;&gt;&gt; existing tools will generate Java classes for each type in an IDL&#010;&gt;&gt; file, so if you're only using Java then you might not need a .avsc&#010;&gt;&gt; file for each type in the IDL.&#010;&gt;&gt;&#010;&gt;&gt; It would not be hard to add a tool (or an option to an existing tool)&#010;&gt;&gt; that wrote a .avsc file for each type in an .avdl file.  One could&#010;&gt;&gt; also add Maven support for this.  If this is of interest, please file&#010;&gt;&gt; an issue in Jira.&#010;&gt;&gt;&#010;&gt;&gt; https://issues.apache.org/jira/browse/AVRO&#010;&gt;&gt;&#010;&gt;&gt; If you're willing and able, please provide an implementation.&#010;&gt;&gt; Otherwise hopefully someone else will help out.&#010;&gt;&gt;&#010;&gt;&gt; Cheers,&#010;&gt;&gt;&#010;&gt;&gt; Doug&#010;&gt;&gt;&#010;&gt;&gt; On Fri, May 17, 2013 at 7:10 AM, Jeremy Kahn &lt;trochee@trochee.net&gt; wrote:&#010;&gt;&gt; &gt; The "types" field in a protocol (.avro) may get you what you need. The&#010;&gt;&gt; &gt; corresponding schema objects should be able to render to well-formed&#010;&gt;&gt; avsc&#010;&gt;&gt; &gt; objects.&#010;&gt;&gt; &gt;&#010;&gt;&gt; &gt; On May 17, 2013 5:47 AM, "Bertrand Dechoux" &lt;dechouxb@gmail.com&gt; wrote:&#010;&gt;&gt; &gt;&gt;&#010;&gt;&gt; &gt;&gt; Hi,&#010;&gt;&gt; &gt;&gt;&#010;&gt;&gt; &gt;&gt; I have lots of avro schemas and most of them are about complex objects&#010;&gt;&gt; (ie&#010;&gt;&gt; &gt;&gt; nested definition of stuff). The syntax of avro idl is attractive in&#010;&gt;&gt; order&#010;&gt;&gt; &gt;&gt; to build something more readable and thus maintainable. However, it&#010;&gt;&gt; looks&#010;&gt;&gt; &gt;&gt; like I can't generate any avsc from a avdl (or the avpr generated from&#010;&gt;&gt; the&#010;&gt;&gt; &gt;&gt; avdl). I understand what is a protocol and I don't need one but the idl&#010;&gt;&gt; &gt;&gt; syntax is really attractive. Is there really no way to use it for that&#010;&gt;&gt; &gt;&gt; purpose?&#010;&gt;&gt; &gt;&gt;&#010;&gt;&gt; &gt;&gt; Regards&#010;&gt;&gt; &gt;&gt;&#010;&gt;&gt; &gt;&gt; Bertrand&#010;&gt;&gt; &gt;&gt;&#010;&gt;&gt; &gt;&gt;&#010;&gt;&gt; &gt;&gt; PS : I remember seeing a discussion about that subject but I can't find&#010;&gt;&gt; &gt;&gt; it.&#010;&gt;&gt;&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt; --&#010;&gt; Bertrand Dechoux&#010;&gt;&#010;&#010;&#010;&#010;-- &#010;Bertrand Dechoux&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Transform a single .avdl or .avpr into one or many .avsc</title>
<author><name>Bertrand Dechoux &lt;dechouxb@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAO6W-2eNAAo07FBax1MNX8W4iQoD=chtj8xCTA5LkZBZt9u+mQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAO6W-2eNAAo07FBax1MNX8W4iQoD=chtj8xCTA5LkZBZt9u+mQ@mail-gmail-com%3e</id>
<updated>2013-05-21T19:50:37Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
After browsing the (nice) API, it seems indeed trivial.&#010;&#010;The Idl class allows to read/parse the related file.&#010;http://avro.apache.org/docs/current/api/java/org/apache/avro/compiler/idl/Idl.html#Idl%28java.io.File%29&#010;&#010;The Protocol object can then be requested from it.&#010;http://avro.apache.org/docs/current/api/java/org/apache/avro/compiler/idl/Idl.html#ProtocolDeclaration%28%29&#010;&#010;Of course, the types can then be requested from the Protocol itself.&#010;http://avro.apache.org/docs/current/api/java/org/apache/avro/Protocol.html#getTypes%28%29&#010;&#010;And then it is only a matter of serializing them. And actually the API&#010;provides even a 'pretty' option.&#010;http://avro.apache.org/docs/current/api/java/org/apache/avro/Schema.html#toString%28boolean%29&#010;&#010;I will definitely look at it (ie with javac). Contributing it as a tool&#010;would be nice. But I won't go into the maven land.&#010;&#010;Regards&#010;&#010;Bertrand&#010;&#010;&#010;On Fri, May 17, 2013 at 9:11 PM, Doug Cutting &lt;cutting@apache.org&gt; wrote:&#010;&#010;&gt; There's not a tool that does this currently.  Note however that&#010;&gt; existing tools will generate Java classes for each type in an IDL&#010;&gt; file, so if you're only using Java then you might not need a .avsc&#010;&gt; file for each type in the IDL.&#010;&gt;&#010;&gt; It would not be hard to add a tool (or an option to an existing tool)&#010;&gt; that wrote a .avsc file for each type in an .avdl file.  One could&#010;&gt; also add Maven support for this.  If this is of interest, please file&#010;&gt; an issue in Jira.&#010;&gt;&#010;&gt; https://issues.apache.org/jira/browse/AVRO&#010;&gt;&#010;&gt; If you're willing and able, please provide an implementation.&#010;&gt; Otherwise hopefully someone else will help out.&#010;&gt;&#010;&gt; Cheers,&#010;&gt;&#010;&gt; Doug&#010;&gt;&#010;&gt; On Fri, May 17, 2013 at 7:10 AM, Jeremy Kahn &lt;trochee@trochee.net&gt; wrote:&#010;&gt; &gt; The "types" field in a protocol (.avro) may get you what you need. The&#010;&gt; &gt; corresponding schema objects should be able to render to well-formed avsc&#010;&gt; &gt; objects.&#010;&gt; &gt;&#010;&gt; &gt; On May 17, 2013 5:47 AM, "Bertrand Dechoux" &lt;dechouxb@gmail.com&gt; wrote:&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; Hi,&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; I have lots of avro schemas and most of them are about complex objects&#010;&gt; (ie&#010;&gt; &gt;&gt; nested definition of stuff). The syntax of avro idl is attractive in&#010;&gt; order&#010;&gt; &gt;&gt; to build something more readable and thus maintainable. However, it&#010;&gt; looks&#010;&gt; &gt;&gt; like I can't generate any avsc from a avdl (or the avpr generated from&#010;&gt; the&#010;&gt; &gt;&gt; avdl). I understand what is a protocol and I don't need one but the idl&#010;&gt; &gt;&gt; syntax is really attractive. Is there really no way to use it for that&#010;&gt; &gt;&gt; purpose?&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; Regards&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; Bertrand&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt;&#010;&gt; &gt;&gt; PS : I remember seeing a discussion about that subject but I can't find&#010;&gt; &gt;&gt; it.&#010;&gt;&#010;&#010;&#010;&#010;-- &#010;Bertrand Dechoux&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>c++ DataFileWriter not doing validation</title>
<author><name>&quot;SCHENK, Jarrad&quot; &lt;jarrad.schenk@baesystems.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c709DF80E10957849B6621979F494E4223F0FD5E3@EDPSVEXM005.au.baesystems.com%3e"/>
<id>urn:uuid:%3c709DF80E10957849B6621979F494E4223F0FD5E3@EDPSVEXM005-au-baesystems-com%3e</id>
<updated>2013-05-21T03:22:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi List,&#010;&#010;I'm working with the c++ bindings to try to write data to avro files.&#010;&#010;Much of the documentation assumes that the types to be written (and the code to write the&#010;data) are generated using avrogencpp.&#010;&#010;In my case I have an existing set of type/struct hierarchies that I'm trying to write so I&#010;don't want to use the output of avrogencpp directly. Instead I am producing code that is very&#010;similar to what avrogencpp outputs but is adapted to suit my types.&#010;&#010;What I'm finding is that the c++ DataFileWriter does no validation between the schema that&#010;I provide and the datums that get written. As such any discrepancy between the schema and&#010;the datums that are written causes the file to be corrupted and essentially unreadable.&#010;&#010;I see that there is a ValidatingEncoder class that can be used when serialising to a memorystream&#010;(as per the Getting Started docs) but there doesn't appear to be any method for using this&#010;encoder with the DataFileWriter.&#010;&#010;Am I missing something? Is there a preferred way to make the writer do validation?&#010;&#010;Thanks&#010;&#010;Jarrad&#010;"Warning:&#010;The information contained in this email and any attached files is&#010;confidential to BAE Systems Australia. If you are not the intended&#010;recipient, any use, disclosure or copying of this email or any&#010;attachments is expressly prohibited.  If you have received this email&#010;in error, please notify us immediately. VIRUS: Every care has been&#010;taken to ensure this email and its attachments are virus free,&#010;however, any loss or damage incurred in using this email is not the&#010;sender's responsibility.  It is your responsibility to ensure virus&#010;checks are completed before installing any data sent in this email to&#010;your computer."&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Secondary Sort Helper</title>
<author><name>Johannes Schulte &lt;johannes.schulte@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAKBEjTs+Hq1PxSr5-o5oBeW34Yqq2aei0yUXaOXORVJ4-0dopw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAKBEjTs+Hq1PxSr5-o5oBeW34Yqq2aei0yUXaOXORVJ4-0dopw@mail-gmail-com%3e</id>
<updated>2013-05-17T20:22:27Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi all,&#010;&#010;I am using a lot of secondary sort Comparators in my avro map reduce jobs.&#010;However I haven't found a comfortable way of constructing the grouping&#010;schema (sorting schema is mostly the default) or the partitioner for the&#010;binary data compare method in AvroKeyComparator.&#010;&#010;So in my jobs i mostly construct a manual schema with Schema.createRecord()&#010;but that is really verbose. I also have some jobs where i created new&#010;schema definitions with the correct grouping order but that is tedious&#010;either.&#010;&#010;Does anyone have an idea of how to make this easier? I could think of&#010;either some meta annotations&#010;in the schema or some property based reflection stuff. In the end it's&#010;mostly one field from the record that's partitoned on and one or two that&#010;are used as grouping comparators.&#010;&#010;I think the basic toolset for doing this should be there..I just need some&#010;hints..&#010;&#010;Cheers,&#010;&#010;Johannes&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Transform a single .avdl or .avpr into one or many .avsc</title>
<author><name>Doug Cutting &lt;cutting@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCALEq1Z-bfZSjPQqCztCh_qhkW_7LNXRnsVUYjz+Kx6F2rAiv4Q@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCALEq1Z-bfZSjPQqCztCh_qhkW_7LNXRnsVUYjz+Kx6F2rAiv4Q@mail-gmail-com%3e</id>
<updated>2013-05-17T19:11:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
There's not a tool that does this currently.  Note however that&#010;existing tools will generate Java classes for each type in an IDL&#010;file, so if you're only using Java then you might not need a .avsc&#010;file for each type in the IDL.&#010;&#010;It would not be hard to add a tool (or an option to an existing tool)&#010;that wrote a .avsc file for each type in an .avdl file.  One could&#010;also add Maven support for this.  If this is of interest, please file&#010;an issue in Jira.&#010;&#010;https://issues.apache.org/jira/browse/AVRO&#010;&#010;If you're willing and able, please provide an implementation.&#010;Otherwise hopefully someone else will help out.&#010;&#010;Cheers,&#010;&#010;Doug&#010;&#010;On Fri, May 17, 2013 at 7:10 AM, Jeremy Kahn &lt;trochee@trochee.net&gt; wrote:&#010;&gt; The "types" field in a protocol (.avro) may get you what you need. The&#010;&gt; corresponding schema objects should be able to render to well-formed avsc&#010;&gt; objects.&#010;&gt;&#010;&gt; On May 17, 2013 5:47 AM, "Bertrand Dechoux" &lt;dechouxb@gmail.com&gt; wrote:&#010;&gt;&gt;&#010;&gt;&gt; Hi,&#010;&gt;&gt;&#010;&gt;&gt; I have lots of avro schemas and most of them are about complex objects (ie&#010;&gt;&gt; nested definition of stuff). The syntax of avro idl is attractive in order&#010;&gt;&gt; to build something more readable and thus maintainable. However, it looks&#010;&gt;&gt; like I can't generate any avsc from a avdl (or the avpr generated from the&#010;&gt;&gt; avdl). I understand what is a protocol and I don't need one but the idl&#010;&gt;&gt; syntax is really attractive. Is there really no way to use it for that&#010;&gt;&gt; purpose?&#010;&gt;&gt;&#010;&gt;&gt; Regards&#010;&gt;&gt;&#010;&gt;&gt; Bertrand&#010;&gt;&gt;&#010;&gt;&gt;&#010;&gt;&gt; PS : I remember seeing a discussion about that subject but I can't find&#010;&gt;&gt; it.&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Transform a single .avdl or .avpr into one or many .avsc</title>
<author><name>Jeremy Kahn &lt;trochee@trochee.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCA+i_aEnnrVE+Cm7Z1mgGNOjcki18MokxCAA0=C+FGxcTxi+c1A@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCA+i_aEnnrVE+Cm7Z1mgGNOjcki18MokxCAA0=C+FGxcTxi+c1A@mail-gmail-com%3e</id>
<updated>2013-05-17T14:10:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
The "types" field in a protocol (.avro) may get you what you need. The&#010;corresponding schema objects should be able to render to well-formed avsc&#010;objects.&#010;On May 17, 2013 5:47 AM, "Bertrand Dechoux" &lt;dechouxb@gmail.com&gt; wrote:&#010;&#010;&gt; Hi,&#010;&gt;&#010;&gt; I have lots of avro schemas and most of them are about complex objects (ie&#010;&gt; nested definition of stuff). The syntax of avro idl is attractive in order&#010;&gt; to build something more readable and thus maintainable. However, it looks&#010;&gt; like I can't generate any avsc from a avdl (or the avpr generated from the&#010;&gt; avdl). I understand what is a protocol and I don't need one but the idl&#010;&gt; syntax is really attractive. Is there really no way to use it for that&#010;&gt; purpose?&#010;&gt;&#010;&gt; Regards&#010;&gt;&#010;&gt; Bertrand&#010;&gt;&#010;&gt;&#010;&gt; PS : I remember seeing a discussion about that subject but I can't find it.&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Transform a single .avdl or .avpr into one or many .avsc</title>
<author><name>Bertrand Dechoux &lt;dechouxb@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAO6W-2fJbKm0uUqHetvrkWumkXPR6dkARR2DGpaF=Lu1=J=RNw@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAO6W-2fJbKm0uUqHetvrkWumkXPR6dkARR2DGpaF=Lu1=J=RNw@mail-gmail-com%3e</id>
<updated>2013-05-17T12:46:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,&#010;&#010;I have lots of avro schemas and most of them are about complex objects (ie&#010;nested definition of stuff). The syntax of avro idl is attractive in order&#010;to build something more readable and thus maintainable. However, it looks&#010;like I can't generate any avsc from a avdl (or the avpr generated from the&#010;avdl). I understand what is a protocol and I don't need one but the idl&#010;syntax is really attractive. Is there really no way to use it for that&#010;purpose?&#010;&#010;Regards&#010;&#010;Bertrand&#010;&#010;&#010;PS : I remember seeing a discussion about that subject but I can't find it.&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Best practices for java enums...?</title>
<author><name>Felix GV &lt;felix@mate1inc.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAECHK7sx1qV8EbO62a=D1Am9xbfqkTgKSaGFJ5CqT8YS2LGLyQ@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAECHK7sx1qV8EbO62a=D1Am9xbfqkTgKSaGFJ5CqT8YS2LGLyQ@mail-gmail-com%3e</id>
<updated>2013-05-16T23:51:37Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Scott,&#010;&#010;How would you envision this working with the avro compiler? Would it be&#010;akin to the capabilites added by&#010;AVRO-1188&lt;https://issues.apache.org/jira/browse/AVRO-1188&gt; where&#010;the existing java enums would be referenced in the importedFiles directory&#010;passed in parameter to the compiler?&#010;&#010;Should I file another JIRA for this...?&#010;&#010;--&#010;Felix&#010;&#010;&#010;On Mon, May 13, 2013 at 5:38 PM, Scott Carey &lt;scottcarey@apache.org&gt; wrote:&#010;&#010;&gt; It would be nice to be able to reference an existing class when using the&#010;&gt; specific compiler.&#010;&gt;&#010;&gt; If you have an existing "com.mycompany.Foo" enum (or SpecificRecord, or&#010;&gt; Fixed type), then provide the specific compiler with the type prior to&#010;&gt; parsing the schema, it could accept a reference:&#010;&gt;&#010;&gt; {"type":"record", "name":"com.mycompany.Rec", "fields": [&#010;&gt;   {"name":"fooField", "type":"com.mycompany.Foo"}&#010;&gt; ]}&#010;&gt;&#010;&gt; Ordinarily, this would fail to compile, but given a reference to an&#010;&gt; existing compatible type, such as an enum, it could work.&#010;&gt;&#010;&gt; -Scott&#010;&gt;&#010;&gt; On 5/9/13 4:39 PM, "Felix GV" &lt;felix@mate1inc.com&gt; wrote:&#010;&gt;&#010;&gt; Hello,&#010;&gt;&#010;&gt; I'm currently writing an avro schema which includes an enum field that I&#010;&gt; already have as a java enum in my application.&#010;&gt;&#010;&gt; At first, I named the avro field with the same fully qualified name&#010;&gt; (package name dot enum name) as my existing java enum. I then ran the avro&#010;&gt; compiler and found that it overwrote my existing java enum with an&#010;&gt; avro-generated enum.&#010;&gt;&#010;&gt; I find this slightly annoying because my java enum had comments&#010;&gt; documenting the purpose of each enum value, and the avro-generated enum&#010;&gt; doesn't have this.&#010;&gt;&#010;&gt; I see two or three potential solutions:&#010;&gt;&#010;&gt;    1. Accepting to replace my current enum with the avro-generated one in&#010;&gt;    my code base, which I feel I cannot document properly (since I have access&#010;&gt;    to just one doc attribute for the whole enum, instead of per symbol). On a&#010;&gt;    side note, I haven't found any way to have a multi-line doc attribute in an&#010;&gt;    avro schema, so that makes things slightly more annoying still. I wouldn't&#010;&gt;    mind settling on using the avro-generated enums without documentation per&#010;&gt;    symbol if at least I could have one big doc/comment that documents all&#010;&gt;    symbols at once, but since it seems the doc attribute must be a one-liner,&#010;&gt;    this is starting to be a little too messy for my taste...&#010;&gt;    2. Maintaining two separate enums: my manually written (and&#010;&gt;    documented) enum as well as the avro-generated enum. For now, I think this&#010;&gt;    is what I'm going to do, because those enums have little chances of&#010;&gt;    changing anyway, but from a maintenance standpoint, it seems pretty&#010;&gt;    horrendous...&#010;&gt;    3. I guess there's a third way, which would involve creating a script&#010;&gt;    that backs up my enums, compiles all my schemas, and then restores my&#010;&gt;    backed up enums, but this also seems ultra messy :( ... I haven't tested if&#010;&gt;    it'd work (since the manually written enum is missing the $SCHEMA field),&#010;&gt;    but I guess it would...&#010;&gt;&#010;&gt; Am I being OCD about this? or is this a concern that others have bumped&#010;&gt; into? How do you guys deal with this? Did I miss anything in the way avro&#010;&gt; works?&#010;&gt;&#010;&gt; P.S.: I've seen that reflect mappings may be able to work with arbitrary&#010;&gt; java enums, but since they seemed discouraged for performance reasons, I&#010;&gt; haven't digged much in this direction. I'd like to keep using .avsc files&#010;&gt; if possible, but if there's a better way, I can certainly try it.&#010;&gt;&#010;&gt; P.P.S.: We're currently using avro 1.6.1, but if the latest version&#010;&gt; provides a nice way of handling my use case, then I guess I could get us to&#010;&gt; upgrade...&#010;&gt;&#010;&gt; Thanks a lot :) !&#010;&gt;&#010;&gt; --&#010;&gt; Felix&#010;&gt;&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: avrò and mongo</title>
<author><name>Russell Jurney &lt;russell.jurney@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCANSvDjrjMcm7rNGBhFBV2=ZG=0r-esRdRx8OzVyKHrZogcFwJg@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCANSvDjrjMcm7rNGBhFBV2=ZG=0r-esRdRx8OzVyKHrZogcFwJg@mail-gmail-com%3e</id>
<updated>2013-05-15T23:36:46Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
FWIW, I have a simple script, avro_to_mongo.pig that does the conversion:&#010;https://github.com/rjurney/enron-node-mongo/blob/master/avro_to_mongo.pig&#010;&#010;&#010;On Wed, May 15, 2013 at 4:12 PM, Atin Sood &lt;soodatin@outlook.com&gt; wrote:&#010;&#010;&gt; I am thinking of a use case of creating a dynamic UI where a user gets to&#010;&gt; pick fields that they want and the data type, other attributes like whether&#010;&gt; the fi&#010;&gt; using this I am thinking of generating an avro schema which can be saved&#010;&gt; somewhere&#010;&gt; now when the user comes back to the UI which is based out of this schema&#010;&gt; and enters a bunch of values I would want to validate this data against&#010;&gt; this avro and save it in mongo&#010;&gt;&#010;&gt; just wondering if anyone has tried saving avro data into mongo&#010;&gt; and whats the best way to go about it.. new to avro to there might be a&#010;&gt; fundamental flaw in my thinking&#010;&gt;&#010;&#010;&#010;&#010;-- &#010;Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>avrò and mongo</title>
<author><name>Atin Sood &lt;soodatin@outlook.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cBLU0-SMTP153784A9530F5A993EC5A10D9A20@phx.gbl%3e"/>
<id>urn:uuid:%3cBLU0-SMTP153784A9530F5A993EC5A10D9A20@phx-gbl%3e</id>
<updated>2013-05-15T23:12:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I am thinking of a use case of creating a dynamic UI where a user gets to pick fields that&#010;they want and the data type, other attributes like whether the fi&#010;using this I am thinking of generating an avro schema which can be saved somewhere&#010;now when the user comes back to the UI which is based out of this schema and enters a bunch&#010;of values I would want to validate this data against this avro and save it in mongo&#010;&#010;just wondering if anyone has tried saving avro data into mongo&#010;and whats the best way to go about it.. new to avro to there might be a fundamental flaw in&#010;my thinking&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>How best to represent this in a union.</title>
<author><name>William McKenzie &lt;wsmckenz@cartewright.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAC7g-J185SOthr2sWGJBFWuRvosH8DaPz-5z3LJGWHpCthTt1A@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAC7g-J185SOthr2sWGJBFWuRvosH8DaPz-5z3LJGWHpCthTt1A@mail-gmail-com%3e</id>
<updated>2013-05-14T21:27:59Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Just trying to validate this is a good approach.We currently have a union&#010;schema that we use to define a simple data item of type "any" (like the old&#010;COM Variant):&#010;&#010;{"name": "item", "type": [ "double",  "float",  "int",  "long",  "string",&#010; "DateTime",  "null"] }&#010;&#010;I'd like to add another union member that works like "null", in that no&#010;data ever gets written except the union discriminator itself. We are&#010;streaming time-series data, and this value would have a special meaning of&#010;"value is unchanged". I could make it an enum with just one value, but then&#010;you would write at least two bytes. So I'm thinking I can make a record:&#010;&#010;{&#010;"type": "record",&#010;"name": "Unchanged",&#010;"fields":&#010;[&#010;{ "name": "item", "type": "null" }&#010;]&#010;}&#010;&#010;and then my union becomes&#010;&#010;{&#010; "name": "item", "type": [ "double",  "float",  "int",  "long",  "string",&#010; "DateTime",  "null", Unchanged]&#010;}&#010;&#010;Seem reasonable?&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: ArrayIndexOutOfBoundsException in Symbol.getSymbol in map reduce job</title>
<author><name>Sripad Sriram &lt;sripad@path.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAA3dCByiD7tO8LQJ=6H_HknwVvRNPjK_NJEJiD20tN7=bz73Ww@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAA3dCByiD7tO8LQJ=6H_HknwVvRNPjK_NJEJiD20tN7=bz73Ww@mail-gmail-com%3e</id>
<updated>2013-05-14T02:20:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I'm using Avro 1.7.4, and the error is coming between a map step and reduce&#010;step. I'm unsure how an Avro record could get corrupted between a mapper&#010;and reducer, but it has, how would I be able to handle that exception?&#010;&#010;&#010;On Mon, May 13, 2013 at 3:40 PM, Harsh J &lt;harsh@cloudera.com&gt; wrote:&#010;&#010;&gt; Its difficult to tell what the error means without context and other&#010;&gt; info (such as version). If I had to guess, I think there may be a&#010;&gt; corruption on the file being processed here. Does running the file&#010;&gt; through avro-tools' tojson sub-command end up in a successful read?&#010;&gt;&#010;&gt; On Tue, May 14, 2013 at 3:28 AM, Sripad Sriram &lt;sripad@path.com&gt; wrote:&#010;&gt; &gt; Hi all,&#010;&gt; &gt;&#010;&gt; &gt; A java hadoop job that's previously executed without issue began erroring&#010;&gt; &gt; with the following stack trace - have any of you seen this before?&#010;&gt; &gt;&#010;&gt; &gt; java.lang.ArrayIndexOutOfBoundsException: 14&#010;&gt; &gt;         at&#010;&gt; &gt; org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)&#010;&gt; &gt;         at&#010;&gt; &gt; org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)&#010;&gt; &gt;         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)&#010;&gt; &gt;         at&#010;&gt; &gt; org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:83)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:65)&#010;&gt; &gt;         at&#010;&gt; &gt; org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1262)&#010;&gt; &gt;         at&#010;&gt; &gt; org.apache.hadoop.mapred.Task$ValuesIterator.nextKey(Task.java:1233)&#010;&gt; &gt;         at&#010;&gt; &gt; org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:533)&#010;&gt; &gt;         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)&#010;&gt; &gt;         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)&#010;&gt; &gt;         at java.security.AccessController.doPrivileged(Native Method)&#010;&gt; &gt;         at javax.security.auth.Subject.doAs(Subject.java:396)&#010;&gt; &gt;         at&#010;&gt; &gt;&#010;&gt; org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)&#010;&gt; &gt;         at org.apache.hadoop.mapred.Child.main(Child.java:249)&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt; --&#010;&gt; Harsh J&#010;&gt;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Jackson and Avro, nested schema</title>
<author><name>Doug Cutting &lt;cutting@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCALEq1Z8gd1+kCSYsPm0w1gjXUnt4km2j22bfGm_mPM9iCWuLsg@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCALEq1Z8gd1+kCSYsPm0w1gjXUnt4km2j22bfGm_mPM9iCWuLsg@mail-gmail-com%3e</id>
<updated>2013-05-13T23:25:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Wed, May 8, 2013 at 11:49 AM, David Arthur &lt;mumrah@gmail.com&gt; wrote:&#010;&gt; I have looked at the Json schema included with Avro, but this requires a&#010;&gt; top-level "value" element which I don't want.&#010;&#010;There's code in Avro that will read and write Jackson JsonNode&#010;directly, without creating any intermediate "value" structure.&#010;&#010;http://avro.apache.org/docs/current/api/java/org/apache/avro/data/Json.html&#010;&#010;One should be able to easily write a JsonParser and JsonGenerator that&#010;read and write directly using this schema, so that Jackson's&#010;ObjectCodec could then be used to read and write arbitrary Pojos to&#010;Avro data files.&#010;&#010;Doug&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Hadoop serialization DatumReader/Writer</title>
<author><name>Marshall Bockrath-Vandegrift &lt;llasram@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3c87txm6zcqe.fsf@zeno.atl.damballa%3e"/>
<id>urn:uuid:%3c87txm6zcqe-fsf@zeno-atl-damballa%3e</id>
<updated>2013-05-13T23:22:49Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Scott Carey &lt;scottcarey@apache.org&gt; writes:&#010;&#010;&gt; Making the DatumReader/Writers configurable would be a welcome&#010;&gt; addition.&#010;&#010;Excellent!&#010;&#010;&gt; Ideally, much more of what goes on there could be:&#010;&gt;  1. configuration driven&#010;&gt;  2. pre-computed to avoid repeated work during decoding/encoding&#010;&gt;&#010;&gt; We do some of both already.  The trick is to do #1 without impacting&#010;&gt; performance and #2 requires a bigger overhaul.&#010;&#010;Which work in particular?  In my pass through the AvroSerialization&#010;implementation so far, it looks like each MR task would create either&#010;one or two Serializers/Deserializers (key and value), each of which in&#010;turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.&#010;Or do De/Serializers get created multiple times per task?&#010;&#010;&gt; If you would like, a contribution including a Clojure related maven&#010;&gt; module or two that depends on the Java stuff would be a welcome&#010;&gt; addition and allow us to identify compatibility issues as we change&#010;&gt; the Java library over time.&#010;&#010;That sounds like a great end-goal.  Right now at the company I work for&#010;(Damballa) we've just started getting our toes wet with Avro.  Avro won&#010;our serialization-format bake-off, but we haven't started actually using&#010;it.  I just finished an initial pass at Avro-Clojure integration and we&#010;have released it under an open source license:&#010;&#010;    https://github.com/damballa/abracad&#010;&#010;I would very much like to eventually get a iteration of it into Avro&#010;proper, but I want to actually start using it and Avro first, so we can&#010;hammer out any interface issues etc.&#010;&#010;Anyway, I'll try to work up a patch to add some more configuration hooks&#010;to the AvroSerialization.  Should I also create a ticket in the Avro&#010;issue tracker?&#010;&#010;-Marshall&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: ArrayIndexOutOfBoundsException in Symbol.getSymbol in map reduce job</title>
<author><name>Harsh J &lt;harsh@cloudera.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAOcnVr2n-F2f4yaqcNbN+aSkXGTd+xEiDJOt+oir-xD=BC1T8A@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAOcnVr2n-F2f4yaqcNbN+aSkXGTd+xEiDJOt+oir-xD=BC1T8A@mail-gmail-com%3e</id>
<updated>2013-05-13T22:40:56Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Its difficult to tell what the error means without context and other&#010;info (such as version). If I had to guess, I think there may be a&#010;corruption on the file being processed here. Does running the file&#010;through avro-tools' tojson sub-command end up in a successful read?&#010;&#010;On Tue, May 14, 2013 at 3:28 AM, Sripad Sriram &lt;sripad@path.com&gt; wrote:&#010;&gt; Hi all,&#010;&gt;&#010;&gt; A java hadoop job that's previously executed without issue began erroring&#010;&gt; with the following stack trace - have any of you seen this before?&#010;&gt;&#010;&gt; java.lang.ArrayIndexOutOfBoundsException: 14&#010;&gt;         at&#010;&gt; org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)&#010;&gt;         at&#010;&gt; org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)&#010;&gt;         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)&#010;&gt;         at&#010;&gt; org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)&#010;&gt;         at&#010;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)&#010;&gt;         at&#010;&gt; org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)&#010;&gt;         at&#010;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)&#010;&gt;         at&#010;&gt; org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)&#010;&gt;         at&#010;&gt; org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:83)&#010;&gt;         at&#010;&gt; org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:65)&#010;&gt;         at&#010;&gt; org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1262)&#010;&gt;         at&#010;&gt; org.apache.hadoop.mapred.Task$ValuesIterator.nextKey(Task.java:1233)&#010;&gt;         at&#010;&gt; org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:533)&#010;&gt;         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)&#010;&gt;         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)&#010;&gt;         at java.security.AccessController.doPrivileged(Native Method)&#010;&gt;         at javax.security.auth.Subject.doAs(Subject.java:396)&#010;&gt;         at&#010;&gt; org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)&#010;&gt;         at org.apache.hadoop.mapred.Child.main(Child.java:249)&#010;&#010;&#010;&#010;-- &#010;Harsh J&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Hadoop serialization DatumReader/Writer</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDB6B0D0.EE96B%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDB6B0D0-EE96B%25scott@richrelevance-com%3e</id>
<updated>2013-05-13T22:08:29Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Making the DatumReader/Writers configurable would be a welcome addition.&#010;&#010;Ideally, much more of what goes on there could be:&#010; 1. configuration driven&#010; 2. pre-computed to avoid repeated work during decoding/encoding&#010;&#010;We do some of both already.  The trick is to do #1 without impacting&#010;performance and #2 requires a bigger overhaul.&#010;&#010;If you would like, a contribution including a Clojure related maven module&#010;or two that depends on the Java stuff would be a welcome addition and&#010;allow us to identify compatibility issues as we change the Java library&#010;over time.&#010;&#010;&#010;On 5/8/13 3:33 PM, "Marshall Bockrath-Vandegrift" &lt;llasram@gmail.com&gt;&#010;wrote:&#010;&#010;&gt;Hi all:&#010;&gt;&#010;&gt;Is there a reason Avro¹s Hadoop serialization classes don¹t allow&#010;&gt;configuration of the DatumReader and DatumWriter classes?&#010;&gt;&#010;&gt;My use-case is that I¹m implementing Clojure DatumReader and -Writer&#010;&gt;classes which produce and consume Clojure¹s data structures directly.&#010;&gt;I¹d like to then extend that to Hadoop MapReduce jobs which operate in&#010;&gt;terms of Clojure data, with Avro handling all de/serialization directly&#010;&gt;to/from that Clojure data.&#010;&gt;&#010;&gt;Am I going around this in a backwards fashion, or would a patch to allow&#010;&gt;configuration of the Hadoop serialization DatumReader/Writers be&#010;&gt;welcome?&#010;&gt;&#010;&gt;-Marshall&#010;&gt;&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: avro.java.string vs utf8 compatibility in recent pig and hive versions</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDB6AEC8.EE942%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDB6AEC8-EE942%25scott@richrelevance-com%3e</id>
<updated>2013-05-13T21:59:33Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
The change in the Pig loader in PIG-3297 seems correct  they must use&#010;CharSequence, not Utf8.&#010;&#010;I suspect that the Avro 1.5.3.jar does not respect the "avro.java.string"&#010;property and is using Utf8 (for the API that Pig is using), but have not&#010;confirmed it.  "avro.java.string" is an optional hint for the Java&#010;implementation.&#010;&#010;On the Avro side, we may be able to make a modification that allows one to&#010;configure a decoder or encoder to ignore the "avro.java.string" property.&#010;Perhaps it could look for a system property as an override to help with&#010;cases like this.&#010;&#010;&#010;On 5/10/13 3:16 PM, "Michael Moss" &lt;michael.moss@gmail.com&gt; wrote:&#010;&#010;&gt; Hello, &#010;&gt; &#010;&gt; It looks like representing avro strings as Utf8 provide some interesting&#010;&gt; performance enhancements, but I'm wondering if folks out there are actually&#010;&gt; using it in practice, or have had any issues with it.&#010;&gt; &#010;&gt; We have recently run into an issue where our avro files which represents&#010;&gt; strings as "avro.java.string" are causing ClassCastExceptions because Pig and&#010;&gt; Hive are expecting them to be Utf8. The exceptions occur when using&#010;&gt; avro-1.7.x.jar, but dissapear when using version avro-1.5.3.jar.&#010;&gt; &#010;&gt; I'm wondering if this is something that should be addressed in the avro jar,&#010;&gt; or in pig and hive like this thread suggests:&#010;&gt; https://issues.apache.org/jira/browse/PIG-3297&#010;&gt; &#010;&gt; Here are the exceptions we are seeing:&#010;&gt; Hive:&#010;&gt; Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to&#010;&gt; org.apache.avro.util.Utf8        at&#010;&gt; org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeMap(AvroDeseria&#010;&gt; lizer.java:253)&#010;&gt; &#010;&gt; Pig:&#010;&gt; Caused by: java.io.IOException: java.lang.ClassCastException: java.lang.String&#010;&gt; cannot be cast to org.apache.avro.util.Utf8&#010;&gt; at &#010;&gt; &#010;org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275&gt;&#010;)&#010;&gt; at &#010;&gt; org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.n&#010;&gt; extKeyValue(PigRecordReader.java:194)&#010;&gt; at &#010;&gt; org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.&#010;&gt; java:532)&#010;&gt; &#010;&gt; Thanks.&#010;&gt; &#010;&gt; -Mike&#010;&gt; &#010;&gt; &#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>ArrayIndexOutOfBoundsException in Symbol.getSymbol in map reduce job</title>
<author><name>Sripad Sriram &lt;sripad@path.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCAA3dCBy7nn7x196ddZ669+5ddXX+WTtJ7gH-JjCWGrWyLPw+TA@mail.gmail.com%3e"/>
<id>urn:uuid:%3cCAA3dCBy7nn7x196ddZ669+5ddXX+WTtJ7gH-JjCWGrWyLPw+TA@mail-gmail-com%3e</id>
<updated>2013-05-13T21:58:52Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi all,&#010;&#010;A java hadoop job that's previously executed without issue began erroring&#010;with the following stack trace - have any of you seen this before?&#010;&#010;java.lang.ArrayIndexOutOfBoundsException: 14&#010;        at&#010;org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)&#010;        at&#010;org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)&#010;        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)&#010;        at&#010;org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)&#010;        at&#010;org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)&#010;        at&#010;org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)&#010;        at&#010;org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)&#010;        at&#010;org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)&#010;        at&#010;org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:83)&#010;        at&#010;org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:65)&#010;        at&#010;org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1262)&#010;        at&#010;org.apache.hadoop.mapred.Task$ValuesIterator.nextKey(Task.java:1233)&#010;        at&#010;org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:533)&#010;        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)&#010;        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)&#010;        at java.security.AccessController.doPrivileged(Native Method)&#010;        at javax.security.auth.Subject.doAs(Subject.java:396)&#010;        at&#010;org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)&#010;        at org.apache.hadoop.mapred.Child.main(Child.java:249)&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Jackson and Avro, nested schema</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDB6AB71.EE8F4%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDB6AB71-EE8F4%25scott@richrelevance-com%3e</id>
<updated>2013-05-13T21:51:40Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
It appears that you will need to modify the JSON decoder in Avro to&#010;achieve this.&#010;&#010;The JSON decoder in Avro was built to encode any Avro schema into JSON&#010;with 100% fidelity, so that the decoder can read it back.  The decoder&#010;does not work with any arbitrary JSON.&#010;&#010;This is because there are ambiguities:&#010;&#010;In your example:&#010;{&#010;  "id": "doc1",&#010;  "fields": {&#010;    "foo": "bar",&#010;    "spam": "eggs",&#010;    "answer": 42,&#010;    "x": {"a": 1}&#010;  }&#010;}&#010;&#010;&#010;This can be interpreted by Avro in several ways.  Is the value of "fields"&#010;a map or a record with four fields?  is the value of "x" a map or a record&#010;with one field?  Is "answer" an int, long, float, or double?  is a string&#010;"doc1" a string or a bytes literal?&#010;&#010;If you want to bake in the assumption that it is "maps, all the way down",&#010;you'll need to extend / modify the JSON Decoder.&#010;&#010;It would be a useful contribution to have a generic JSON schema and&#010;decoder for it.  We could have a "JSON" schema record (one field, a union&#010;of null, string, double, and map of string to self) and this type's field&#010;would automatically be un-nested by the special JSON decoder and not&#010;interpreted as a record.&#010;&#010;-Scott&#010;&#010;On 5/8/13 11:49 AM, "David Arthur" &lt;mumrah@gmail.com&gt; wrote:&#010;&#010;&gt;I'm attempting to use Jackson and Avro together to map JSON documents to&#010;&gt;a generated Avro class. I have looked at the Json schema included with&#010;&gt;Avro, but this requires a top-level "value" element which I don't want.&#010;&gt;Essentially, I have JSON documents that have a few typed top level&#010;&gt;fields, and one field called "fields" which is more or less arbitrary&#010;&gt;JSON.&#010;&gt;&#010;&gt;I've reduced this down to strings and ints for simplicity&#010;&gt;&#010;&gt;My first attempt was:&#010;&gt;&#010;&gt;  {&#010;&gt;     "type": "record",&#010;&gt;     "name": "Json",&#010;&gt;     "fields": [&#010;&gt;       {&#010;&gt;         "name": "value",&#010;&gt;         "type": [ "string", "int", {"type": "map", "values": "Json"} ]&#010;&gt;       }&#010;&gt;     ]&#010;&gt;   },&#010;&gt;&#010;&gt;   {&#010;&gt;     "name": "Document",&#010;&gt;     "type": "record",&#010;&gt;     "fields": [&#010;&gt;       {&#010;&gt;         "name": "id",&#010;&gt;         "type": "string"&#010;&gt;       },&#010;&gt;       {&#010;&gt;         "name": "fields",&#010;&gt;         "type": {"type": "map", "values": ["string", "int", {"type":&#010;&gt;"map", "values": "Json"}]}&#010;&gt;       }&#010;&gt;     ]&#010;&gt;   }&#010;&gt;&#010;&gt;Given a JSON document like:&#010;&gt;&#010;&gt;{&#010;&gt;   "id": "doc1",&#010;&gt;   "fields": {&#010;&gt;     "foo": "bar",&#010;&gt;     "spam": "eggs",&#010;&gt;     "answer": 42,&#010;&gt;     "x": {"a": 1}&#010;&gt;   }&#010;&gt;}&#010;&gt;&#010;&gt;this seems to work, but it doesn't. When I turn around and try to&#010;&gt;serialize this object with Avro, I get the following exception:&#010;&gt;&#010;&gt;java.lang.ClassCastException: java.lang.Integer cannot be cast to&#010;&gt;org.apache.avro.generic.IndexedRecord&#010;&gt;     at org.apache.avro.generic.GenericData.getField(GenericData.java:526)&#010;&gt;     at org.apache.avro.generic.GenericData.getField(GenericData.java:541)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.&#010;&gt;java:104)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:6&#010;&gt;6)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.jav&#010;&gt;a:173)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:6&#010;&gt;9)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:7&#010;&gt;3)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.jav&#010;&gt;a:173)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:6&#010;&gt;9)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.&#010;&gt;java:106)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:6&#010;&gt;6)&#010;&gt;     at &#010;&gt;org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:5&#010;&gt;8)&#010;&gt;&#010;&gt;My best guess is that since the "fields" field is a union, the&#010;&gt;representation of it in the generate class is an Object which Jackson&#010;&gt;happily throws whatever into.&#010;&gt;&#010;&gt;If I change my schema to explicitly use "int" instead of the "Json"&#010;&gt;type, it works fine for my test document&#010;&gt;&#010;&gt;         "type": {"type": "map", "values": ["string", "int", {"type":&#010;&gt;"map", "values": "int"}]}&#010;&gt;&#010;&gt;However now I need to enumerate the types for each level of nesting I&#010;&gt;want. This is not ideal, and limits me to a fixed level of nesting&#010;&gt;&#010;&gt;To be clear, my issue is not modelling my schema in Avro, but rather&#010;&gt;getting Jackson to map JSON onto the generated classes without too much&#010;&gt;pain. I have also tried&#010;&gt;https://github.com/FasterXML/jackson-dataformat-avro without much luck.&#010;&gt;&#010;&gt;Any help is appreciated&#010;&gt;&#010;&gt;-David&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;&#010;&gt;&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Best practices for java enums...?</title>
<author><name>Scott Carey &lt;scottcarey@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3cCDB6AA5D.EE8D6%25scott@richrelevance.com%3e"/>
<id>urn:uuid:%3cCDB6AA5D-EE8D6%25scott@richrelevance-com%3e</id>
<updated>2013-05-13T21:38:28Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
It would be nice to be able to reference an existing class when using the&#010;specific compiler.&#010;&#010;If you have an existing "com.mycompany.Foo" enum (or SpecificRecord, or&#010;Fixed type), then provide the specific compiler with the type prior to&#010;parsing the schema, it could accept a reference:&#010;&#010;{"type":"record", "name":"com.mycompany.Rec", "fields": [&#010;  {"name":"fooField", "type":"com.mycompany.Foo"}&#010;]}&#010;&#010;Ordinarily, this would fail to compile, but given a reference to an existing&#010;compatible type, such as an enum, it could work.&#010;&#010;-Scott&#010;&#010;On 5/9/13 4:39 PM, "Felix GV" &lt;felix@mate1inc.com&gt; wrote:&#010;&#010;&gt; Hello, &#010;&gt; &#010;&gt; I'm currently writing an avro schema which includes an enum field that I&#010;&gt; already have as a java enum in my application.&#010;&gt; &#010;&gt; At first, I named the avro field with the same fully qualified name (package&#010;&gt; name dot enum name) as my existing java enum. I then ran the avro compiler and&#010;&gt; found that it overwrote my existing java enum with an avro-generated enum.&#010;&gt; &#010;&gt; I find this slightly annoying because my java enum had comments documenting&#010;&gt; the purpose of each enum value, and the avro-generated enum doesn't have this.&#010;&gt; &#010;&gt; I see two or three potential solutions:&#010;&gt; 1. Accepting to replace my current enum with the avro-generated one in my code&#010;&gt; base, which I feel I cannot document properly (since I have access to just one&#010;&gt; doc attribute for the whole enum, instead of per symbol). On a side note, I&#010;&gt; haven't found any way to have a multi-line doc attribute in an avro schema, so&#010;&gt; that makes things slightly more annoying still. I wouldn't mind settling on&#010;&gt; using the avro-generated enums without documentation per symbol if at least I&#010;&gt; could have one big doc/comment that documents all symbols at once, but since&#010;&gt; it seems the doc attribute must be a one-liner, this is starting to be a&#010;&gt; little too messy for my taste...&#010;&gt; 2. Maintaining two separate enums: my manually written (and documented) enum&#010;&gt; as well as the avro-generated enum. For now, I think this is what I'm going to&#010;&gt; do, because those enums have little chances of changing anyway, but from a&#010;&gt; maintenance standpoint, it seems pretty horrendous...&#010;&gt; 3. I guess there's a third way, which would involve creating a script that&#010;&gt; backs up my enums, compiles all my schemas, and then restores my backed up&#010;&gt; enums, but this also seems ultra messy :( ... I haven't tested if it'd work&#010;&gt; (since the manually written enum is missing the $SCHEMA field), but I guess it&#010;&gt; would... &#010;&gt; Am I being OCD about this? or is this a concern that others have bumped into?&#010;&gt; How do you guys deal with this? Did I miss anything in the way avro works?&#010;&gt; &#010;&gt; P.S.: I've seen that reflect mappings may be able to work with arbitrary java&#010;&gt; enums, but since they seemed discouraged for performance reasons, I haven't&#010;&gt; digged much in this direction. I'd like to keep using .avsc files if possible,&#010;&gt; but if there's a better way, I can certainly try it.&#010;&gt; &#010;&gt; P.P.S.: We're currently using avro 1.6.1, but if the latest version provides a&#010;&gt; nice way of handling my use case, then I guess I could get us to upgrade...&#010;&gt; &#010;&gt; Thanks a lot :) !&#010;&gt; &#010;&gt; --&#010;&gt; Felix&#010;&#010;&#010;&#010;
</pre>
</div>
</content>
</entry>
</feed>
