hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Implicit casting on bag operators
Date Fri, 09 May 2008 14:40:50 GMT
Union is an example of bag (relational) operators that can have more than
one input.

In case that schemas from all the input ports are the same, no problem.
In case that schemas from all the input ports are not compatible, no problem
because we won't process it.
In case that schemas from all the input ports are not the same, but
compatible, here comes a problem.



Schema(A) = < Int, Chararray >
Schema(B) = < Double, Chararray >

The output schema will get resolved to < Double, Chararray >. Here is the
problem. The Union operator at the moment doesn't support casting in any
layer. In this case if we don't cast it, the binary data of Int will get
picked up as Double by the downstream operator!! There are a couple
solutions for this:-

1) Implement LOUnion and POUnion to support type casting internally
2) Add casting support in LOUnion operator and let the LogicalToPhysical
compiler generates LOForeach for it.
3) Explicitly insert LOForEach to do necessary casting between Union and the
problematic input. This is analogous to the way we implement implicit
casting for expression operators.
4) Don't support "not same but compatible" case at all.

I will do (3) because it makes the most sense to me plus incurs the least
impact on other modules. Does anyone have problem with it?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message