pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: comments appreciated for pig AvroStorage
Date Wed, 01 Dec 2010 23:47:50 GMT
Yes, you are right.  I have not added general schema read support yet.  It is a work in progress.

There are many similarities, but I have not looked at your source code yet.   It would be
best if we can merge all of these in time, one way or another.  I would have liked to have
gotten feedback in the AVRO-592 JIRA about other use cases and needs if you were aware of
it.  Perhaps we could have avoided duplication of effort.


On Dec 1, 2010, at 1:07 PM, Lin Guo wrote:

> Yes, we are well aware of the two jiras (refer to the related work
> section of the doc in
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data).
> We couldn't use the patch in AVRO-592 because we want to process avro
> data generated by our tracking system (or any arbitrary avro data)
> which is not supported by the patch.
> ============ from AVRO-592 ====================
> The current restriction is that you can't read an arbitrary Avro
> record and make a Tuple out of it, even though the total number of
> possible Avro schemas that can be coerced into a Tuple is much larger
> than supported, I wanted to support that in a separate place.
> =============================================
> Best,
> Lin
> On Wed, Dec 1, 2010 at 10:16 AM, Scott Carey <scott@richrelevance.com> wrote:
>> There are two other JIRAs with alternate Avro<-->Pig implementations with different
feature sets.
>> https://issues.apache.org/jira/browse/PIG-794 aims to use Avro internally within
Pig for efficiency, including intermediate serializatoin.
>> https://issues.apache.org/jira/browse/AVRO-592 has the same goals that your patch
does, but has fewer restrictions on what can and can't be written/read.  It supports writing
any Pig schema and reading it back in, but only reading a subset of Avro schemas (non-recursive;
I may add unions later).  With a little more work it could support intermediate serialization
for pig as well.   Longer term goals include being able to use AvroStorage along with a Hive
AvroSerDe on the same data, supporting projection, and supporting partitioning.
>> I've been hoping to finish up AVRO-592 but am currently busy with other things.
>> -Scott
>> On Nov 30, 2010, at 9:05 PM, Lin Guo wrote:
>>> Hi,
>>> We'd like to patch our pig AvroStorage function and
>>> would highly appreciate any kinds of comments.
>>> doc:
>>> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data
>>> jira:
>>> https://issues.apache.org/jira/browse/PIG-1748
>>> Many thanks,
>>> Lin

View raw message