Return-Path: Delivered-To: apmail-incubator-pig-dev-archive@locus.apache.org Received: (qmail 96818 invoked from network); 10 Oct 2008 17:18:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Oct 2008 17:18:05 -0000 Received: (qmail 40328 invoked by uid 500); 10 Oct 2008 17:18:04 -0000 Delivered-To: apmail-incubator-pig-dev-archive@incubator.apache.org Received: (qmail 40298 invoked by uid 500); 10 Oct 2008 17:18:04 -0000 Mailing-List: contact pig-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@incubator.apache.org Delivered-To: mailing list pig-dev@incubator.apache.org Received: (qmail 40284 invoked by uid 99); 10 Oct 2008 17:18:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2008 10:18:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2008 17:17:07 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 365B2234C214 for ; Fri, 10 Oct 2008 10:17:44 -0700 (PDT) Message-ID: <1426035179.1223659064208.JavaMail.jira@brutus> Date: Fri, 10 Oct 2008 10:17:44 -0700 (PDT) From: "Alan Gates (JIRA)" To: pig-dev@incubator.apache.org Subject: [jira] Commented: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year In-Reply-To: <1592153427.1223452424346.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638594#action_12638594 ] Alan Gates commented on PIG-476: -------------------------------- Pig Latin has a way that you can define constructors for a UDF: {code} define MyDateExtractor org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor("MM-dd-yyyy"); ... A = FOREACH raw GENERATE DateExtractor(dayTime); {code} Whatever you pass as an argument in the define method is passed to the constructor of the UDF. In your case, this would allow you to pass the date format up front, parse it once, and avoid parsing it on every tuple passed to the UDF. This should give you a significant performance boost. The downside of this is that if you want to use the same UDF with different date formats in the same query you'd have to alias it different ways. It's up to you whether to choose flexibility or performance here. > given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year > --------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-476 > URL: https://issues.apache.org/jira/browse/PIG-476 > Project: Pig > Issue Type: New Feature > Reporter: Earl Cahill > Attachments: DateExtractor-PIG-476 > > > Want to be able to do something like > A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss"); > to extract the year, or if your date is formatted as > dd/MMM/yyyy:HH:mm:ss Z > you could do something like > A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy"); > to grab out the day -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.