hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Implicit casting on bag operators
Date Tue, 13 May 2008 15:24:34 GMT
I agree that option 3 is the correct course.

One note, you say:

In case that schemas from all the input ports are not compatible, no problem
because we won't process it.

How do you mean "won't process it"?  We still have to allow a union 
operation between two non-compatible inputs (otherwise we can only use 
union when we have schemas).  But the resulting union will not have a 
schema (since the output no longer has a consistent schema).

Alan.


pi song wrote:
> Union is an example of bag (relational) operators that can have more than
> one input.
>
> In case that schemas from all the input ports are the same, no problem.
> In case that schemas from all the input ports are not compatible, no problem
> because we won't process it.
> In case that schemas from all the input ports are not the same, but
> compatible, here comes a problem.
>
> Example:
>
> C = UNION A,B ;
>
> Schema(A) = < Int, Chararray >
> Schema(B) = < Double, Chararray >
>
> The output schema will get resolved to < Double, Chararray >. Here is the
> problem. The Union operator at the moment doesn't support casting in any
> layer. In this case if we don't cast it, the binary data of Int will get
> picked up as Double by the downstream operator!! There are a couple
> solutions for this:-
>
> 1) Implement LOUnion and POUnion to support type casting internally
> 2) Add casting support in LOUnion operator and let the LogicalToPhysical
> compiler generates LOForeach for it.
> 3) Explicitly insert LOForEach to do necessary casting between Union and the
> problematic input. This is analogous to the way we implement implicit
> casting for expression operators.
> 4) Don't support "not same but compatible" case at all.
>
> I will do (3) because it makes the most sense to me plus incurs the least
> impact on other modules. Does anyone have problem with it?
>
> Pi
>
>   

Mime
View raw message