hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Re: Implicit casting on bag operators
Date Tue, 13 May 2008 23:06:06 GMT
Ok, will follow that.

On 5/14/08, Alan Gates <gates@yahoo-inc.com> wrote:
>
> I agree that option 3 is the correct course.
>
> One note, you say:
>
> In case that schemas from all the input ports are not compatible, no
> problem
> because we won't process it.
>
> How do you mean "won't process it"?  We still have to allow a union
> operation between two non-compatible inputs (otherwise we can only use union
> when we have schemas).  But the resulting union will not have a schema
> (since the output no longer has a consistent schema).
>
> Alan.
>
>
> pi song wrote:
>
> > Union is an example of bag (relational) operators that can have more
> > than
> > one input.
> >
> > In case that schemas from all the input ports are the same, no problem.
> > In case that schemas from all the input ports are not compatible, no
> > problem
> > because we won't process it.
> > In case that schemas from all the input ports are not the same, but
> > compatible, here comes a problem.
> >
> > Example:
> >
> > C = UNION A,B ;
> >
> > Schema(A) = < Int, Chararray >
> > Schema(B) = < Double, Chararray >
> >
> > The output schema will get resolved to < Double, Chararray >. Here is
> > the
> > problem. The Union operator at the moment doesn't support casting in any
> > layer. In this case if we don't cast it, the binary data of Int will get
> > picked up as Double by the downstream operator!! There are a couple
> > solutions for this:-
> >
> > 1) Implement LOUnion and POUnion to support type casting internally
> > 2) Add casting support in LOUnion operator and let the LogicalToPhysical
> > compiler generates LOForeach for it.
> > 3) Explicitly insert LOForEach to do necessary casting between Union and
> > the
> > problematic input. This is analogous to the way we implement implicit
> > casting for expression operators.
> > 4) Don't support "not same but compatible" case at all.
> >
> > I will do (3) because it makes the most sense to me plus incurs the
> > least
> > impact on other modules. Does anyone have problem with it?
> >
> > Pi
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message