hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antonio Piccolboni (JIRA)" <>
Subject [jira] [Created] (HIVE-11949) "LOCAL" in LOAD DATA LOCAL INPATH means "remote"
Date Thu, 24 Sep 2015 20:44:04 GMT
Antonio Piccolboni created HIVE-11949:

             Summary: "LOCAL" in LOAD DATA LOCAL INPATH means "remote"
                 Key: HIVE-11949
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 1.2.1
            Reporter: Antonio Piccolboni

originally filed as SPARK-10804 -- checked it affects hive in HDP2 as well.
Connecting with a remote thriftserver with a custom JDBC client or beeline, load data local
inpath fails. Hiveserver2 docs ([],
JDBC client sample code) explain in a quick comment that local now means local to the server.
I think this is just a rationalization for a bug. When a user types "local" 

# it needs to be local to him, not some server 
# Failing 1., one needs to have a way to determine what local means and create a "local" item
under the new definition. 

With the thirftserver, I have a host to connect to, but I don't have any way to create a file
local to that host, at least in spark. It may not be desirable to create user directories
on the thriftserver host or running file transfer services like scp. Moreover, it appears
that this syntax is unique to Hive but its origin can be traced to  LOAD DATA LOCAL INFILE
in Oracle and was adopted by mysql. In the latter docs we can read "If LOCAL is specified,
the file is read by the client program on the client host and sent to the server. The file
can be given as a full path name to specify its exact location. If given as a relative path
name, the name is interpreted relative to the directory in which the client program was started".
This is not to say that the hive team is bound to what Oracle and Mysql do, but to support
the idea that the meaning of LOCAL is settled. For instance, the Impala documentation says:
"Currently, the Impala LOAD DATA statement only imports files from HDFS, not from the local
filesystem. It does not support the LOCAL keyword of the Hive LOAD DATA statement." I think
this is a better solution, if true client locality can not be implemented. The way things
are in thriftserver, I developed a client under the assumption that I could use LOAD DATA
LOCAL INPATH and all tests where passing in standalone mode, only to find with the first distributed
test that 

This message was sent by Atlassian JIRA

View raw message