crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Invalid schema created in AvrosTest#testNestedTables
Date Tue, 03 Jul 2012 14:26:15 GMT
On Tue, Jul 3, 2012 at 12:42 AM, Gabriel Reid <gabriel.reid@gmail.com> wrote:
> Hi guys,
>
> While implementing the map side joins, I needed to make the PType
> interface extend Serializable, and that caused me to stumble upon an
> issue in AvrosTest. The testNestedTables method creates a nested table
> schema (unsurprisingly), with the call in question being equivalent to
> this:
>
>     Avros.tableOf(Avros.strings(), Avros.tableOf(Avros.ints(),
> Avros.doubles()));
>
> This results in an invalid schema being created due to the same
> namespace and name (org.apache.avro.mapred.Pair) being used twice in
> the schema.
>
> The error with the the invalid schema occurs in the Schema#toString
> method -- it should probably result in an exception during the
> creation of the Schema itself, but the toString method is used
> everywhere, so this will fail if it's used no matter what.
>
> Does anyone know nested Avro tables is a real use case that needs to
> be supported (seeing as they won't actually work right now)? Am I
> right in assuming that pretty much the same thing could be
> accomplished by just doing the following call?
>
>     Avros.tableOf(Avros.strings(), Avros.pairs(Avros.ints(), Avros.doubles()));
>
> I know that unique names are created for Avro Tuple schemas in Crunch,
> and I assume that this is done to avoid name collisions like this
> case. I'm thinking that we could do some kind of similar trick to
> allow the nested tables to work, but I don't think that this is worth
> the effort (unless someone says that nested tables are a real use case
> that needs to be supported).
>
> I'll report the late catching of the invalid Schema to the Avro
> project, but unless anyone objects, I think we can probably just
> remove this test case. Anyone against that idea?

I added it as a convenience method for a project I was working on-- my
thought was that if you tried to put a PTableType inside of another
PTableType, we should detect it and automatically convert the nested
PTableType to a pairs(keyType, valueType), as you indicated was the
right thing to do. I'm surprised that it didn't work properly, since I
thought that I was doing the conversion from PTableType to
pairs(keyType, valueType) when the PType was being constructed. Do you
mind if I take a look at it first and try to fix it?

>
>
> - Gabriel

Mime
View raw message