Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 7273 invoked from network); 4 Aug 2010 20:44:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Aug 2010 20:44:38 -0000 Received: (qmail 42991 invoked by uid 500); 4 Aug 2010 20:44:38 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 42973 invoked by uid 500); 4 Aug 2010 20:44:38 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 42965 invoked by uid 99); 4 Aug 2010 20:44:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Aug 2010 20:44:38 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Aug 2010 20:44:37 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o74KiHWn024812 for ; Wed, 4 Aug 2010 20:44:17 GMT Message-ID: <6756926.166381280954657723.JavaMail.jira@thor> Date: Wed, 4 Aug 2010 16:44:17 -0400 (EDT) From: "Thejas M Nair (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1536) use same logic for merging inner schemas in "default union" and "union onschema" In-Reply-To: <7378813.166311280954536057.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895410#action_12895410 ] Thejas M Nair commented on PIG-1536: ------------------------------------ The way 'default union' deals with columns of different but compatible types in same position is not right. It creates a merged schema choosing a merged type, but there is not cast that happens to convert the rows to this type. eg - {code} grunt> l1 = load '/tmp/f1' as (a : chararray, t (a : int, c : long) ); grunt> l2 = load '/tmp/f1' as (a : chararray, t (a : int, b : int) ); grunt> u = union l1, l2; grunt> describe u; u: {a: chararray,t: (a: int,c: long)} -- the result of u, only the rows originating from l1 will correspond to schema shown in describe. MapReduce node 1-206 Map Plan u: Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-203 | |---u: Union[bag] - 1-202 | |---l1: New For Each(false,false)[bag] - 1-195 | | | | | Cast[chararray] - 1-192 | | | | | |---Project[bytearray][0] - 1-191 | | | | | Cast[tuple:(int,long)] - 1-194 | | | | | |---Project[bytearray][1] - 1-193 | | | |---l1: Load(/tmp/f1:org.apache.pig.builtin.PigStorage) - 1-190 | |---l2: New For Each(false,false)[bag] - 1-201 | | | Cast[chararray] - 1-198 | | | |---Project[bytearray][0] - 1-197 | | | Cast[tuple:(int,int)] - 1-200 | | | |---Project[bytearray][1] - 1-199 | |---l2: Load(/tmp/f1:org.apache.pig.builtin.PigStorage) - 1-196-------- Global sort: false ---------------- {code} > use same logic for merging inner schemas in "default union" and "union onschema" > -------------------------------------------------------------------------------- > > Key: PIG-1536 > URL: https://issues.apache.org/jira/browse/PIG-1536 > Project: Pig > Issue Type: Task > Reporter: Thejas M Nair > Fix For: 0.9.0 > > > We should consider using logic for merging inner schema in case of the two different types of union. > In case of 'default union', it merges the two inner schema of bags/tuples by position if the number of fields are same and the corresponding types are compatible. > In case of 'union onschema', it considers tuple/bag with different innerschema to be incompatible types. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.