hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zsolt Tóth <toth.zsolt....@gmail.com>
Subject Re: Can't access file in Distributed Cache in Hive 1.1.0
Date Tue, 30 Jun 2015 09:22:06 GMT
Thank you for your answer. The plans are identical for Hive 1.0.0 and Hive
1.1.0.

You're right, Hive-1.1.0 does not start a MapReduce job for the query,
while Hive-1.0.0 does. Should I file a JIRA for this issue?

2015-05-07 21:17 GMT+02:00 Jason Dere <jdere@hortonworks.com>:

>  Is this on Hive CLI, or using HiveServer2?
>
>  Can you run "explain select in_file('a', './testfile') from a;" from
> both Hive 1.0.0 and hive 1.1.0 and see if they look different?
> One possibile thing that might be happening here is that in Hive-1.1.0,
> this query is being executed without the need for a map/reduce job, in
> which case the working directory for the query is probably the local
> working directory from when Hive was invoked. I don't think the Distributed
> Cache will be working correctly in this case, because the UDF is not
> running in a map/reduce task.
>
>  If a map-reduce job is kicked off for the query and the UDF is running
> in this m/r task environment, then the distributed cache will likely be
> working fine.
>
>  If there is a way to ensure the query with your UDF runs as part of a
> map/reduce job this may do the trick.  Adding an order-by will do it, but
> maybe other people on this list may have a better way of making this happen.
>
>
>
>  On May 7, 2015, at 3:28 AM, Zsolt Tóth <toth.zsolt.bme@gmail.com> wrote:
>
>  Does this error occur for anyone else? It might be a serious issue.
>
> 2015-05-05 13:59 GMT+02:00 Zsolt Tóth <toth.zsolt.bme@gmail.com>:
>
>> Hi,
>>
>>  I've just upgraded to Hive 1.1.0 and it looks like there is a problem
>> with the distributed cache.
>> I use ADD FILE, then an UDF that wants to read the file. The following
>> syntax works in Hive 1.0.0 but Hive can't find the file in 1.1.0 (testfile
>> exists on hdfs, the built-in udf in_file is just an example):
>>
>>  add file hdfs:///tmp/testfile;
>> select in_file('a', './testfile') from a;
>>
>>  However, it works with the local path:
>>
>>  select in_file('a',
>> '/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile') from a;
>>
>>  When I try to list the files in the directory "./" in Hive 1.1.0, it
>> lists the cluster's root directory. It looks like the working directory
>> changed in Hive 1.1.0. Is this intended? If so, how can I access the files
>> in the distributed cache added with ADD FILE?
>>
>>  Regards,
>> Zsolt
>>
>
>
>

Mime
View raw message