crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Invalid schema created in AvrosTest#testNestedTables
Date Tue, 03 Jul 2012 07:42:03 GMT
Hi guys,

While implementing the map side joins, I needed to make the PType
interface extend Serializable, and that caused me to stumble upon an
issue in AvrosTest. The testNestedTables method creates a nested table
schema (unsurprisingly), with the call in question being equivalent to

    Avros.tableOf(Avros.strings(), Avros.tableOf(Avros.ints(),

This results in an invalid schema being created due to the same
namespace and name (org.apache.avro.mapred.Pair) being used twice in
the schema.

The error with the the invalid schema occurs in the Schema#toString
method -- it should probably result in an exception during the
creation of the Schema itself, but the toString method is used
everywhere, so this will fail if it's used no matter what.

Does anyone know nested Avro tables is a real use case that needs to
be supported (seeing as they won't actually work right now)? Am I
right in assuming that pretty much the same thing could be
accomplished by just doing the following call?

    Avros.tableOf(Avros.strings(), Avros.pairs(Avros.ints(), Avros.doubles()));

I know that unique names are created for Avro Tuple schemas in Crunch,
and I assume that this is done to avoid name collisions like this
case. I'm thinking that we could do some kind of similar trick to
allow the nested tables to work, but I don't think that this is worth
the effort (unless someone says that nested tables are a real use case
that needs to be supported).

I'll report the late catching of the invalid Schema to the Avro
project, but unless anyone objects, I think we can probably just
remove this test case. Anyone against that idea?

- Gabriel

View raw message