Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Message-ID: <5592AC7D.4030003@oracle.com>
Date: Tue, 30 Jun 2015 10:49:33 -0400
From: gabriel balan <gabriel.balan@oracle.com>
Organization: Oracle Corporation
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: user@hive.apache.org
Subject: Re: Can't access file in Distributed Cache in Hive 1.1.0
References: 
 <CABVU3FEf9LsZVEYP4=Ku=POw6PCbot8kOfz6ETVWDMGquaBTVw@mail.gmail.com>
	<CABVU3FHxT74tQxUTn06a842X-si=1rPrNTUR-SR8s7XX8obYow@mail.gmail.com>
	<2412D3F0-281B-4710-A7E7-070683864F6A@hortonworks.com>
 <CABVU3FGx+ONkwMRjZKhFD1rBt-HCkJ588ocqEMezsNt=L9v9eA@mail.gmail.com>
In-Reply-To: 
 <CABVU3FGx+ONkwMRjZKhFD1rBt-HCkJ588ocqEMezsNt=L9v9eA@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------000308040503010402080206"

This is a multi-part message in MIME format.
--------------000308040503010402080206
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

Hi

Try "set hive.fetch.task.conversion=minimal;" in hive cli to get an MR job rather than a local fetch task.

hth
Gabriel Balan

On 6/30/2015 5:22 AM, Zsolt Tóth wrote:
> Thank you for your answer. The plans are identical for Hive 1.0.0 and Hive 1.1.0.
>
> You're right, Hive-1.1.0 does not start a MapReduce job for the query, while Hive-1.0.0 does. Should I file a JIRA for this issue?
>
> 2015-05-07 21:17 GMT+02:00 Jason Dere <jdere@hortonworks.com <mailto:jdere@hortonworks.com>>:
>
>     Is this on Hive CLI, or using HiveServer2?
>
>     Can you run "explain select in_file('a', './testfile') from a;" from both Hive 1.0.0 and hive 1.1.0 and see if they look different?
>     One possibile thing that might be happening here is that in Hive-1.1.0, this query is being executed without the need for a map/reduce job, in which case the working directory for the query is probably the local working directory from when Hive was invoked. I don't think the Distributed Cache will be working correctly in this case, because the UDF is not running in a map/reduce task.
>
>     If a map-reduce job is kicked off for the query and the UDF is running in this m/r task environment, then the distributed cache will likely be working fine.
>
>     If there is a way to ensure the query with your UDF runs as part of a map/reduce job this may do the trick. Adding an order-by will do it, but maybe other people on this list may have a better way of making this happen.
>
>
>
>     On May 7, 2015, at 3:28 AM, Zsolt Tóth <toth.zsolt.bme@gmail.com <mailto:toth.zsolt.bme@gmail.com>> wrote:
>
>>     Does this error occur for anyone else? It might be a serious issue.
>>
>>     2015-05-05 13:59 GMT+02:00 Zsolt Tóth <toth.zsolt.bme@gmail.com <mailto:toth.zsolt.bme@gmail.com>>:
>>
>>         Hi,
>>
>>         I've just upgraded to Hive 1.1.0 and it looks like there is a problem with the distributed cache.
>>         I use ADD FILE, then an UDF that wants to read the file. The following syntax works in Hive 1.0.0 but Hive can't find the file in 1.1.0 (testfile exists on hdfs, the built-in udf in_file is just an example):
>>
>>         add file hdfs:///tmp/testfile;
>>         select in_file('a', './testfile') from a;
>>
>>         However, it works with the local path:
>>
>>         select in_file('a', '/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile') from a;
>>
>>         When I try to list the files in the directory "./" in Hive 1.1.0, it lists the cluster's root directory. It looks like the working directory changed in Hive 1.1.0. Is this intended? If so, how can I access the files in the distributed cache added with ADD FILE?
>>
>>         Regards,
>>         Zsolt
>>
>>
>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.


--------------000308040503010402080206
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi<br>
    <br>
    Try "set hive.fetch.task.conversion=minimal;" in hive cli to get an
    MR job rather than a local fetch task.<br>
    <br>
    hth<br>
    Gabriel Balan<br>
    <br>
    <div class="moz-cite-prefix">On 6/30/2015 5:22 AM, Zsolt Tóth wrote:<br>
    </div>
    <blockquote
cite="mid:CABVU3FGx+ONkwMRjZKhFD1rBt-HCkJ588ocqEMezsNt=L9v9eA@mail.gmail.com"
      type="cite">
      <div dir="ltr">Thank you for your answer. The plans are identical
        for Hive 1.0.0 and Hive 1.1.0.
        <div><br>
          <div>You're right, Hive-1.1.0 does not start a MapReduce job
            for the query, while Hive-1.0.0 does. Should I file a JIRA
            for this issue?</div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2015-05-07 21:17 GMT+02:00 Jason Dere <span
            dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:jdere@hortonworks.com" target="_blank">jdere@hortonworks.com</a>&gt;</span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div style="word-wrap:break-word">
              <div>Is this on Hive CLI, or using HiveServer2?</div>
              <div><br>
              </div>
              <div>Can you run "explain select in_file('a',
                './testfile') from a;" from both Hive 1.0.0 and hive
                1.1.0 and see if they look different?</div>
              <div>One possibile thing that might be happening here is
                that in Hive-1.1.0, this query is being executed without
                the need for a map/reduce job, in which case the working
                directory for the query is probably the local working
                directory from when Hive was invoked. I don't think the
                Distributed Cache will be working correctly in this
                case, because the UDF is not running in a map/reduce
                task.</div>
              <div><br>
              </div>
              <div>If a map-reduce job is kicked off for the query and
                the UDF is running in this m/r task environment, then
                the distributed cache will likely be working fine.</div>
              <div><br>
              </div>
              <div>If there is a way to ensure the query with your UDF
                runs as part of a map/reduce job this may do the trick. 
                Adding an order-by will do it, but maybe other people on
                this list may have a better way of making this happen.</div>
              <div>
                <div class="h5">
                  <div><br>
                  </div>
                  <div><br>
                  </div>
                  <br>
                  <div>
                    <div>On May 7, 2015, at 3:28 AM, Zsolt Tóth &lt;<a
                        moz-do-not-send="true"
                        href="mailto:toth.zsolt.bme@gmail.com"
                        target="_blank">toth.zsolt.bme@gmail.com</a>&gt;
                      wrote:</div>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">Does this error occur for anyone
                        else? It might be a serious issue.</div>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">2015-05-05 13:59
                          GMT+02:00 Zsolt Tóth <span dir="ltr">&lt;<a
                              moz-do-not-send="true"
                              href="mailto:toth.zsolt.bme@gmail.com"
                              target="_blank">toth.zsolt.bme@gmail.com</a>&gt;</span>:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div dir="ltr">Hi,
                              <div><br>
                              </div>
                              <div>I've just upgraded to Hive 1.1.0 and
                                it looks like there is a problem with
                                the distributed cache.</div>
                              <div>I use ADD FILE, then an UDF that
                                wants to read the file. The following
                                syntax works in Hive 1.0.0 but Hive
                                can't find the file in 1.1.0 (testfile
                                exists on hdfs, the built-in udf in_file
                                is just an example):</div>
                              <div><br>
                              </div>
                              <div>
                                <div>add file <a moz-do-not-send="true">hdfs:///tmp/testfile</a>;</div>
                                <div>select in_file('a', './testfile')
                                  from a;</div>
                              </div>
                              <div><br>
                              </div>
                              <div>However, it works with the local
                                path:</div>
                              <div><br>
                              </div>
                              <div>select in_file('a',
                                '/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile')
                                from a;</div>
                              <div><br>
                              </div>
                              <div>When I try to list the files in the
                                directory "./" in Hive 1.1.0, it lists
                                the cluster's root directory. It looks
                                like the working directory changed in
                                Hive 1.1.0. Is this intended? If so, how
                                can I access the files in the
                                distributed cache added with ADD FILE?</div>
                              <div><br>
                              </div>
                              <div>Regards,</div>
                              <div>Zsolt</div>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="99999999">-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.</pre>
  </body>
</html>

--------------000308040503010402080206--