pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park" <cheol...@cloudera.com>
Subject Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage
Date Wed, 19 Sep 2012 00:10:51 GMT


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > Have a few comments.

Thank you very much for reviewing my patch, Santhosh! Please find responses inline.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
line 199
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line199>
> >
> >     Why aren't float and double returning float and double respectively?

I am worried about loss of precision by converting a big positive/negative int to float/double.
Do you disagree?


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
line 210
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line210>
> >
> >     Why aren't float and double returning float and double respectively?

Same as above.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
line 221
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line221>
> >
> >     Why aren't int and long returning float ?

Same as above.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
line 232
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line232>
> >
> >     Why aren't int and long returning double ?

Same as above.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
line 273
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line273>
> >
> >     Is the equality applicable all the way through?

No, this is just an enum equality check. However, that's sufficient here since mergeType()
is meant to merge primitive types only.

I added a comment regarding what this method is meant for.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java,
line 144
> > <https://reviews.apache.org/r/6884/diff/3/?file=154467#file154467line144>
> >
> >     Are the size of t and mProtoTuple expected to match?

No, they are not always expected to match because the size of mProtoTuple is equal to that
of merged schema while the size of t is equal to that of individual input schemas.

In fact, I realized that this loop is redundant after all because the initialization of mProtoTuple
is already done inside the constructor. So I removed this.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java,
line 93
> > <https://reviews.apache.org/r/6884/diff/3/?file=154467#file154467line93>
> >
> >     This will append to the end of the list resulting in a list size of 2*tupleSize.
Was that your intention?

No, that's not my intention.

But I believe that initializing the capability of ArrayList doesn't initialize its size. The
size seems to be 0 until an entry is actually added to it.

Nevertheless, I changed it to add(i, null), so now it's more clear.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java,
line 184
> > <https://reviews.apache.org/r/6884/diff/3/?file=154469#file154469line184>
> >
> >     Why isn't bytes + other non-null type = bytes ?

My reasoning is that bytes is just blob, so its type is unknown; therefore, the merged type
is unknown. I wanted to be conservative in terms of type converting to avoid run-time exceptions.

But please let me know if you think otherwise.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
line 132
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line132>
> >
> >     Is it multiple_schema or multiple_schemas ?

It should be multiple_schemas. Fixed.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
line 311
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line311>
> >
> >     Any reason for removing this here and not in other places?

I put it back.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
line 362
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line362>
> >
> >     The logic used here is close to the logic used in setLocation. Is it possible
to refactor the code and move this to a utility method?

Their logic is very similar, but there is a difference. At a high level, they'are like this:

// setLocation
if (a) {
  do x;
  do y;
}

// getSchema
if (a) {
  do x;
}

where

a: getAllSubDirs()
x: initialize inputAvroSchema
y: call FileInputFormat.setInputPaths()

I factored out the code for initializing inputAvroSchema into a method setInputAvroSchema(),
but I found it hard to refactor more than this. Please let me know if you have any suggestion.

By the way, while thinking about this, I eliminated getConcretePathFromGlob(). Originally,
getConcretePathFromGlob() was called by getAvroSchema() to load the schema from the first
file that matches the glob pattern. But since we build 'paths' by calling getAllSubDirs()
anyway, it seems better to resue 'paths' inside getAvroSchema().


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
line 130
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line130>
> >
> >     Can these string constants be declared to make them reusable and more readable?


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review11566
-----------------------------------------------------------


On Sept. 19, 2012, 12:10 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2012, 12:10 a.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's
original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message