pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders
Date Fri, 03 Sep 2010 17:58:35 GMT

     [ https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Thejas M Nair updated PIG-1595:

    Attachment: PIG-1595.1.patch

- In this patch a new sublcass of DependencyOrderWalker has been created (DependencyOrderWalkerLPScalar)
, when it chooses the sink nodes of the plan to start the walk, it chooses them in the order
as determined by the dependency order resulting from the ReadScalars dependencies.
- The LOCast that was being added after ReadScalars to get expected type is no longer necessary
and has been removed. 
- There is also a check in PigServer.mergeScalars() to see if the LOStore that the code attempts
to re-use has the same store function - InterStorage which is used by ReadScalar udf to read
the input .
- No new unit test case has been added as the test TestScalarAliases.testFilteredScalarDollarProj
is a test case that was failing without the additional cast now succeeds without the cast.

Unit tests have passed. Test-patch result results are pasted below. Patch is ready for review.
     [exec] -1 overall.
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]     +1 release audit.  The applied patch does not increase the total number of
release audit warnings.

> casting relation to scalar- problem with handling of data from non PigStorage loaders
> -------------------------------------------------------------------------------------
>                 Key: PIG-1595
>                 URL: https://issues.apache.org/jira/browse/PIG-1595
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>         Attachments: PIG-1595.1.patch
> If load functions that don't follow the same bytearray format as PigStorage for other
supported datatypes, or those that don't implement the LoadCaster interface are used in 'casting
relation to scalar' (PIG-1434), it can cause the query to fail or create incorrect results.
> The root cause of the problem is that there is a real dependency between the ReadScalars
udf that returns the scalar value and the LogicalOperator that acts as its input. But the
logicalplan does not capture this dependency. So in SchemaResetter visitor used by the optimizer,
the order in which schema is reset and evaluated does not take this into consideration. If
the schema of the input LogicalOperator does not get evaluated before the ReadScalar udf,
the resutltype of ReadScalar udf becomes bytearray. POUserFunc will convert the input to bytearray
using ' new DataByteArray(inp.toString().getBytes())'. But this bytearray encoding of other
supported types might not be same for the LoadFunction associated with the column, and that
can result in problems.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message