Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5CE77C189 for ; Thu, 24 May 2012 02:11:43 +0000 (UTC) Received: (qmail 49025 invoked by uid 500); 24 May 2012 02:11:42 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 48982 invoked by uid 500); 24 May 2012 02:11:42 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 48869 invoked by uid 500); 24 May 2012 02:11:42 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 48753 invoked by uid 99); 24 May 2012 02:11:42 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 02:11:42 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 003AC141887 for ; Thu, 24 May 2012 02:11:42 +0000 (UTC) Date: Thu, 24 May 2012 02:11:41 +0000 (UTC) From: "Jie Li (JIRA)" To: pig-dev@hadoop.apache.org Message-ID: <886624697.14566.1337825502003.JavaMail.jiratomcat@issues-vm> In-Reply-To: <965507544.45616.1336587710709.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282128#comment-13282128 ] Jie Li commented on PIG-2691: ----------------------------- As there was no documentation on the field schema of TOKENIZE, can we assume that if users want to use the field name, she would explicitly name it by AS? If so, then this change wouldn't break the script. > Duplicate TOKENIZE schema > ------------------------- > > Key: PIG-2691 > URL: https://issues.apache.org/jira/browse/PIG-2691 > Project: Pig > Issue Type: Bug > Reporter: Gianmarco De Francisci Morales > Assignee: Jie Li > Labels: simple > Attachments: PIG-2691.patch, PIG-2691.patch.2 > > > TOKENIZE produces a fixed named schema that results in duplicates if used more than once in the same generate statement. > We could paramenterize the schema on the name of the field being tokenized. > {code} > grunt> q = LOAD 'file' AS (source:chararray, target:chararray); > grunt> e = FOREACH q GENERATE TOKENIZE(source), TOKENIZE(target); > 2012-05-09 20:18:37,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: > Duplicate schema alias: bag_of_tokenTuples > grunt> e = FOREACH q GENERATE TOKENIZE(source) as s_entities, TOKENIZE(target) as t_entities; > grunt> describe e > e: {s_entities: {tuple_of_tokens: (token: chararray)},t_entities: {tuple_of_tokens: (token: chararray)}} > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira