Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7434118521 for ; Tue, 14 Jul 2015 20:38:15 +0000 (UTC) Received: (qmail 64685 invoked by uid 500); 14 Jul 2015 20:37:39 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 64617 invoked by uid 500); 14 Jul 2015 20:37:39 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 64607 invoked by uid 99); 14 Jul 2015 20:37:39 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jul 2015 20:37:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C46D31A6EC5 for ; Tue, 14 Jul 2015 20:37:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.969 X-Spam-Level: ** X-Spam-Status: No, score=2.969 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id P5rv5kEyDqhq for ; Tue, 14 Jul 2015 20:37:38 +0000 (UTC) Received: from BLU004-OMC3S18.hotmail.com (blu004-omc3s18.hotmail.com [65.55.116.93]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id A0F9720EFB for ; Tue, 14 Jul 2015 20:37:37 +0000 (UTC) Received: from BLU168-W94 ([65.55.116.72]) by BLU004-OMC3S18.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 14 Jul 2015 13:37:31 -0700 X-TMN: [5ZlhORmWPc21uzSM3P6sc4r3rQfzpTuA] X-Originating-Email: [tbenleo@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_94e51bb3-753d-4660-ac63-939e6df6df12_" From: Bennie Leo To: "user@hive.apache.org" Subject: Optimizing UDF Date: Tue, 14 Jul 2015 13:37:30 -0700 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 14 Jul 2015 20:37:31.0487 (UTC) FILETIME=[E832C6F0:01D0BE74] --_94e51bb3-753d-4660-ac63-939e6df6df12_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C =20 I'm trying to optimize a UDF that runs very slowly on Hive. The UDF takes i= n a 5GB table and builds a large data structure out of it to facilitate loo= kups. The 5GB input is loaded into the distributed cache with an 'add file = ' command=2C and the UDF builds the data structure a single time per = instance (or so it should).=20 =20 My problem is that the Hive UDF takes several hours to complete=2C while ru= nning the exact same code on my local machine takes 5 minutes! What could b= e causing Hive to be so impractically slow? According to the Hive logs=2C t= he data transfer takes 5-10 minutes=2C which is reasonable. What else is ta= king so long? =20 Thanks=2C B = --_94e51bb3-753d-4660-ac63-939e6df6df12_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi=2C
 =3B
I'm trying = to optimize a UDF that runs very slowly on Hive. The UDF takes in a 5GB tab= le =3Band builds a large data structure out of it to facilitate lookups= . The 5GB input is loaded into the distributed cache with an 'add file <= =3Bpath>=3B' command=2C and =3Bthe UDF =3Bbuilds the data structu= re a single time per instance (or so it should).
 =3B
My problem= is that the Hive UDF takes several hours to complete=2C while running the = exact =3Bsame code on my local machine takes 5 minutes! What could be c= ausing Hive to be so impractically slow? According to the Hive logs=2C the = data transfer takes 5-10 minutes=2C which is reasonable. What else is takin= g so long?
 =3B
Thanks=2C
B
= --_94e51bb3-753d-4660-ac63-939e6df6df12_--