Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 72442 invoked from network); 28 Mar 2009 09:16:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Mar 2009 09:16:15 -0000 Received: (qmail 83384 invoked by uid 500); 28 Mar 2009 09:16:14 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 83322 invoked by uid 500); 28 Mar 2009 09:16:14 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 83312 invoked by uid 99); 28 Mar 2009 09:16:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Mar 2009 09:16:14 +0000 X-ASF-Spam-Status: No, hits=-1999.5 required=10.0 tests=ALL_TRUSTED,URI_NOVOWEL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Mar 2009 09:16:12 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 53391234C003 for ; Sat, 28 Mar 2009 02:15:51 -0700 (PDT) Message-ID: <494816771.1238231751325.JavaMail.jira@brutus> Date: Sat, 28 Mar 2009 02:15:51 -0700 (PDT) From: "Viraj Bhat (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Updated: (PIG-738) Regexp passed from pigscript fails in UDF In-Reply-To: <747478239.1238231630465.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-738: --------------------------- Attachment: regexpinput.txt myregexp.jar RegexGroupCount.java Java,Jar for UDF & Input file > Regexp passed from pigscript fails in UDF > ------------------------------------------- > > Key: PIG-738 > URL: https://issues.apache.org/jira/browse/PIG-738 > Project: Pig > Issue Type: Bug > Components: grunt > Affects Versions: 0.3.0 > Reporter: Viraj Bhat > Fix For: 0.3.0 > > Attachments: myregexp.jar, RegexGroupCount.java, regexpinput.txt > > > Consider a pig script which parses and counts regular expressions from a text file. > The regular expression supplied in the Pig script needs to escape the "." (dot) character. > {code} > register myregexp.jar; > -- pattern not picked up > define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports'); > A = load '/user/viraj/regexpinput.txt' using PigStorage() as (source : chararray); > B = foreach A generate minelogs(source) as sportslogs; > dump B; > {code} > Snippet of UDF RegexGroupCount.java > {code} > public class RegexGroupCount extends EvalFunc { > private final Pattern pattern_; > public RegexGroupCount(String patternStr) { > System.out.println("My pattern supplied is "+patternStr); > System.out.println("Equality test "+patternStr.equals("www\\.yahoo\\.com/sports")); > pattern_ = Pattern.compile(patternStr, Pattern.DOTALL|Pattern.CASE_INSENSITIVE); > } > public Integer exec(Tuple input) throws IOException { > } > } > {code} > Running the above script on the following dataset : > ==================================================================================================== > dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl > kas;dka;sd > jsjsjwww.yahoo.com/sports > jsdLSJDcom/sports > wwwJyahooMcom/sports > ==================================================================================================== > Results in the following: > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: viraj-Sat Mar 28 02:06:31 PDT 2009-14) > Userfunc fs: int > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > My pattern supplied is www\\.yahoo\\.com/sports > Equality test false > 2009-03-28 02:06:43,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete > 2009-03-28 02:06:43,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! > (0) > (0) > (0) > (0) > (0) > ==================================================================================================== > In essence there seems to be no way of passing this type of constructor argument through the Pig script. The only workaround seems to be hard coding the values in the UDF!! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.