Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DAAC18701 for ; Tue, 30 Jun 2015 14:50:50 +0000 (UTC) Received: (qmail 56256 invoked by uid 500); 30 Jun 2015 14:50:48 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 56194 invoked by uid 500); 30 Jun 2015 14:50:48 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 56184 invoked by uid 99); 30 Jun 2015 14:50:48 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 14:50:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A4128C0045 for ; Tue, 30 Jun 2015 14:50:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.97 X-Spam-Level: ** X-Spam-Status: No, score=2.97 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id BoYkgu2W0xFy for ; Tue, 30 Jun 2015 14:50:37 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 590874C0D1 for ; Tue, 30 Jun 2015 14:50:37 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t5UEnZbD028185 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 30 Jun 2015 14:49:36 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t5UEnZOH009030 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Tue, 30 Jun 2015 14:49:35 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id t5UEnZZk019801 for ; Tue, 30 Jun 2015 14:49:35 GMT Received: from [10.149.251.27] (/10.149.251.27) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 30 Jun 2015 07:49:35 -0700 Message-ID: <5592AC7D.4030003@oracle.com> Date: Tue, 30 Jun 2015 10:49:33 -0400 From: gabriel balan Organization: Oracle Corporation User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: user@hive.apache.org Subject: Re: Can't access file in Distributed Cache in Hive 1.1.0 References: <2412D3F0-281B-4710-A7E7-070683864F6A@hortonworks.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------000308040503010402080206" X-Source-IP: aserv0021.oracle.com [141.146.126.233] This is a multi-part message in MIME format. --------------000308040503010402080206 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Try "set hive.fetch.task.conversion=minimal;" in hive cli to get an MR job rather than a local fetch task. hth Gabriel Balan On 6/30/2015 5:22 AM, Zsolt Tóth wrote: > Thank you for your answer. The plans are identical for Hive 1.0.0 and Hive 1.1.0. > > You're right, Hive-1.1.0 does not start a MapReduce job for the query, while Hive-1.0.0 does. Should I file a JIRA for this issue? > > 2015-05-07 21:17 GMT+02:00 Jason Dere >: > > Is this on Hive CLI, or using HiveServer2? > > Can you run "explain select in_file('a', './testfile') from a;" from both Hive 1.0.0 and hive 1.1.0 and see if they look different? > One possibile thing that might be happening here is that in Hive-1.1.0, this query is being executed without the need for a map/reduce job, in which case the working directory for the query is probably the local working directory from when Hive was invoked. I don't think the Distributed Cache will be working correctly in this case, because the UDF is not running in a map/reduce task. > > If a map-reduce job is kicked off for the query and the UDF is running in this m/r task environment, then the distributed cache will likely be working fine. > > If there is a way to ensure the query with your UDF runs as part of a map/reduce job this may do the trick. Adding an order-by will do it, but maybe other people on this list may have a better way of making this happen. > > > > On May 7, 2015, at 3:28 AM, Zsolt Tóth > wrote: > >> Does this error occur for anyone else? It might be a serious issue. >> >> 2015-05-05 13:59 GMT+02:00 Zsolt Tóth >: >> >> Hi, >> >> I've just upgraded to Hive 1.1.0 and it looks like there is a problem with the distributed cache. >> I use ADD FILE, then an UDF that wants to read the file. The following syntax works in Hive 1.0.0 but Hive can't find the file in 1.1.0 (testfile exists on hdfs, the built-in udf in_file is just an example): >> >> add file hdfs:///tmp/testfile; >> select in_file('a', './testfile') from a; >> >> However, it works with the local path: >> >> select in_file('a', '/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile') from a; >> >> When I try to list the files in the directory "./" in Hive 1.1.0, it lists the cluster's root directory. It looks like the working directory changed in Hive 1.1.0. Is this intended? If so, how can I access the files in the distributed cache added with ADD FILE? >> >> Regards, >> Zsolt >> >> > > -- The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation. --------------000308040503010402080206 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit Hi

Try "set hive.fetch.task.conversion=minimal;" in hive cli to get an MR job rather than a local fetch task.

hth
Gabriel Balan

On 6/30/2015 5:22 AM, Zsolt Tóth wrote:
Thank you for your answer. The plans are identical for Hive 1.0.0 and Hive 1.1.0.

You're right, Hive-1.1.0 does not start a MapReduce job for the query, while Hive-1.0.0 does. Should I file a JIRA for this issue?

2015-05-07 21:17 GMT+02:00 Jason Dere <jdere@hortonworks.com>:
Is this on Hive CLI, or using HiveServer2?

Can you run "explain select in_file('a', './testfile') from a;" from both Hive 1.0.0 and hive 1.1.0 and see if they look different?
One possibile thing that might be happening here is that in Hive-1.1.0, this query is being executed without the need for a map/reduce job, in which case the working directory for the query is probably the local working directory from when Hive was invoked. I don't think the Distributed Cache will be working correctly in this case, because the UDF is not running in a map/reduce task.

If a map-reduce job is kicked off for the query and the UDF is running in this m/r task environment, then the distributed cache will likely be working fine.

If there is a way to ensure the query with your UDF runs as part of a map/reduce job this may do the trick.  Adding an order-by will do it, but maybe other people on this list may have a better way of making this happen.



On May 7, 2015, at 3:28 AM, Zsolt Tóth <toth.zsolt.bme@gmail.com> wrote:

Does this error occur for anyone else? It might be a serious issue.

2015-05-05 13:59 GMT+02:00 Zsolt Tóth <toth.zsolt.bme@gmail.com>:
Hi,

I've just upgraded to Hive 1.1.0 and it looks like there is a problem with the distributed cache.
I use ADD FILE, then an UDF that wants to read the file. The following syntax works in Hive 1.0.0 but Hive can't find the file in 1.1.0 (testfile exists on hdfs, the built-in udf in_file is just an example):

select in_file('a', './testfile') from a;

However, it works with the local path:

select in_file('a', '/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile') from a;

When I try to list the files in the directory "./" in Hive 1.1.0, it lists the cluster's root directory. It looks like the working directory changed in Hive 1.1.0. Is this intended? If so, how can I access the files in the distributed cache added with ADD FILE?

Regards,
Zsolt




-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
--------------000308040503010402080206--