Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7ED118BE3 for ; Tue, 25 Aug 2015 21:49:01 +0000 (UTC) Received: (qmail 79197 invoked by uid 500); 25 Aug 2015 21:49:00 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 79117 invoked by uid 500); 25 Aug 2015 21:49:00 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 79107 invoked by uid 99); 25 Aug 2015 21:49:00 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 21:49:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 95682C0861 for ; Tue, 25 Aug 2015 21:48:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 7yG-E90gRHvw for ; Tue, 25 Aug 2015 21:48:58 +0000 (UTC) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id A90F4506EF for ; Tue, 25 Aug 2015 21:48:57 +0000 (UTC) Received: by wicja10 with SMTP id ja10so27318592wic.1 for ; Tue, 25 Aug 2015 14:48:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vqOW8OZS3gN/T70E/189SNsg/JgoNW3UGWAyFvGhvE8=; b=wdSXGVac2rsnAXjlvtGaNujqOr9UVP6Z4jBRrwRkJFuOvdS/gxfZBmCHWBBXn80ii1 oaFKF6A06EdmCbV9AQBwD1swEZZVNC0TBRTySfUO/Ahm7HFDoO62QsSUCJsfZQk4IXdj iDO/D/sar/IqOGc0BkjEhl9GuhIKfmPugMbx72xsv38eVKbGHcVqzvu96cxI1nAVfhln F8RqQpRCQ6oe52WAXTJwz0UywkgJZ6POBP076dmbSkSX58BV/mbWA2BrdiVCyINKDk+e ne3Y+UMV8ELTp9W2JxHznZ51LKkgYc1EPoWeu96xAoP7TnQ9NGoiP7gTZaJHJSmeJQy8 YzPA== MIME-Version: 1.0 X-Received: by 10.180.211.170 with SMTP id nd10mr8222560wic.56.1440539336958; Tue, 25 Aug 2015 14:48:56 -0700 (PDT) Received: by 10.28.178.78 with HTTP; Tue, 25 Aug 2015 14:48:56 -0700 (PDT) In-Reply-To: References: <1440538610138.55498@hortonworks.com> Date: Tue, 25 Aug 2015 14:48:56 -0700 Message-ID: Subject: Re: UDF Configure method not getting called From: Rahul Sharma To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11c388d076e9ef051e29b301 --001a11c388d076e9ef051e29b301 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Or alternatively, is there a way to pass configuration without using the configure method? The configuration to the UDF is essentially a list of parameters that tells the UDF, what it should morph into this time and what kind of work it should perform. If there is an all encompassing way to do that, then I can modify the UDF to run irrespective if its run locally or with MapRed context. On Tue, Aug 25, 2015 at 2:44 PM, Rahul Sharma wrote: > Oh thanks for the reply, Jason. That was my suspicion too. > > The UDF in our case is not a function per say in pure mathematical sense > of the word 'function'. That is because, it doesn't take in a value and > give out another value. It has side effects, that form input for another > MapReduce job. The point of doing it this way is that we wanted to make u= se > of the parallelism that would be afforded by running it as a map reduce j= ob > via hive, as the processing is fairly compute extensive. > > Is there a way to force map-reduce jobs? I think > hive.fetch.task.conversion to minimal might help, is there anything that > can be done? > > Thanks a ton. > > On Tue, Aug 25, 2015 at 2:36 PM, Jason Dere wrote= : > >> =E2=80=8BThere might be a few cases where a UDF is executed locally and = not as >> part of a Map/Reduce job=E2=80=8B: >> >> - Hive might choose not to run a M/R task for your query (see >> hive.fetch.task.conversion) >> >> - If the UDF is deterministic and has deterministic inputs, Hive might >> decide to run the UDF once to get the value and use constant folding to >> replace calls of that UDF with the value from the one UDF call (see >> *hive.optimize.constant.propagation=E2=80=8B)* >> >> >> Taking a look at the explain plan for you query might confirm this. In >> those cases the UDF would not run within a M/R task and configure() woul= d >> not be called. >> >> >> >> ------------------------------ >> *From:* Rahul Sharma >> *Sent:* Tuesday, August 25, 2015 11:32 AM >> *To:* user@hive.apache.org >> *Subject:* UDF Configure method not getting called >> >> Hi Guys, >> >> We have a UDF which extends GenericUDF and does some configuration withi= n >> the public void configure(MapredContext ctx) method. >> >> MapredContext in configure method gives access to the HiveConfiguration >> via JobConf, which contains custom attributes of the form xy.abc.somethi= ng. >> Reading these values is required for the semantics of the UDF. >> >> Everything works fine till Hive 0.13, however with Hive 0.14 (or 1.0) th= e >> configure method of the UDF is never called by the runtime and hence the >> UDF cannot configure itself dynamically. >> >> Is this the intended behavior? If so, what is the new way to read >> configuration of the Map Reduce Job within the UDF? >> >> I would be grateful for any help. >> > > --001a11c388d076e9ef051e29b301 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Or alternatively, is there a way to pass configuration wit= hout using the configure method?=C2=A0

The configuration= to the UDF is essentially a list of parameters that tells the UDF, what it= should morph into this time and what kind of work it should perform. If th= ere is an all encompassing way to do that, then I can modify the UDF to run= irrespective if its run locally or with MapRed context.

On Tue, Aug 25, 2015 at = 2:44 PM, Rahul Sharma <kippy.pie@gmail.com> wrote:
Oh thanks for the reply, J= ason. That was my suspicion too.=C2=A0

The UDF in = our case is not a function per say in pure mathematical sense of the word &= #39;function'. That is because, it doesn't take in a value and give= out another value. It has side effects, that form input for another MapRed= uce job. The point of doing it this way is that we wanted to make use of th= e parallelism that would be afforded by running it as a map reduce job via = hive, as the processing is fairly compute extensive.

Is there a way to force map-reduce jobs? I think hive.fetch.task.con= version to minimal might help, is there anything that can be done?

Thanks a ton.

On Tue, A= ug 25, 2015 at 2:36 PM, Jason Dere <jdere@hortonworks.com> wrote:

=E2=80=8BThere might be a few cases where a UDF is executed locally and = not as part of a Map/Reduce job=E2=80=8B:

=C2=A0- Hive might choose not to run a M/R task for your query (see hive= .fetch.task.conversion)

=C2=A0- If the UDF is deterministic and has deterministic inputs, Hive m= ight decide to run the UDF once to get the value and use=C2=A0constant fold= ing to replace calls of that UDF with the value from the one UDF call (see= =C2=A0hive.optimize.constant.propaga= tion=E2=80=8B)


Taking a look at the explain plan f= or you query might confirm this= .=C2=A0In those cases the UDF would not run within a M/R task and configure() would = not be called.=C2=A0




From: Rahul Sharma <kippy.pie@gmail.com>
Sent: Tuesday, August 25, 2015 11:32 AM
To: user@h= ive.apache.org
Subject: UDF Configure method not getting called
=C2=A0
Hi Guys,

We have a UDF which extends Gen= ericUDF and does some configuration within the=C2=A0public void configure(M= apredContext ctx)=C2=A0= method.

MapredContext in configure method gives access to the HiveConfiguratio= n via JobConf, which contains custom attributes of the form xy.abc.somethin= g. Reading these values is required for the semantics of the UDF.

Everything works fine till Hive 0.13, however with Hive 0.14 (or 1.0) = the configure method of the UDF is never called by the runtime and hence th= e UDF cannot configure itself dynamically.

Is this the intended behavior? If so, what is the new way to read conf= iguration of the Map Reduce Job within the UDF?=C2=A0

I would be grateful for any help.


--001a11c388d076e9ef051e29b301--