hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Busbey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20077) hcat command should follow same pattern as hive cli for getting HBase jars
Date Tue, 03 Jul 2018 21:00:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Busbey updated HIVE-20077:
-------------------------------
    Status: Patch Available  (was: In Progress)

-v0
  - copy the hbase detection and jar finding from {{bin/hive}}
  - tested manually on a cluster with relevant HBase changes in place, via Pig running against
hcat.

> hcat command should follow same pattern as hive cli for getting HBase jars
> --------------------------------------------------------------------------
>
>                 Key: HIVE-20077
>                 URL: https://issues.apache.org/jira/browse/HIVE-20077
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 2.3.2, 0.14.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Major
>         Attachments: HIVE-20077.0.patch
>
>
> Currently the {{hcat}} command adds HBase jars to the classpath by using find to walk
the directories under {{$HBASE_HOME/lib}}.
> {code}
> # Look for HBase in a BigTop-compatible way. Avoid thrift version
> # conflict with modern versions of HBase.
> HBASE_HOME=${HBASE_HOME:-"/usr/lib/hbase"}
> HBASE_CONF_DIR=${HBASE_CONF_DIR:-"${HBASE_HOME}/conf"}
> if [ -d ${HBASE_HOME} ] ; then
>    for jar in $(find $HBASE_HOME -name '*.jar' -not -name '*thrift*'); do
>       HBASE_CLASSPATH=$HBASE_CLASSPATH:${jar}
>    done
>    export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${HBASE_CLASSPATH}"
> fi
> if [ -d $HBASE_CONF_DIR ] ; then
>     HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${HBASE_CONF_DIR}"
> fi
> {code}
> This is incorrect as that path contains jars for a mixture of purposes; hbase client
jars, hbase server jars, and hbase shell specific jars. The inclusion of unneeded jars is
mostly innocuous until the upcoming HBase 2.1.0 release. That release will have HBASE-20615
and HBASE-19735, which will mean most client facing installations will have a number of shaded
client artifacts present.
> With those changes in place, the current implementation will include in the hcat runtime
a mix of shaded and non-shaded hbase artifacts that include some Hadoop classes rewritten
to use a shaded version of protobuf. When these mix with other Hadoop classes in the classpath
that have not been rewritten hcat will fail with errors that look like this:
> {code}
> Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
cannot be cast to org.apache.hadoop.hbase.shaded.com.google.protobuf.Message
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:225)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy28.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:875)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>         at com.sun.proxy.$Proxy29.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1643)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1492)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1507)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1668)
>         at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:686)
>         at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:625)
>         at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:557)
>         at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524)
>         at org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.java:149)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
> {code}
> To fix this issue in a way that transparently handles HBase versions the hcat command
should take the same approach as the hive cli and ask HBase for the appropriate set of jars:
> {code}
>   # HBase detection. Need bin/hbase and a conf dir for building classpath entries.
>   # Start with BigTop defaults for HBASE_HOME and HBASE_CONF_DIR.
>   HBASE_HOME=${HBASE_HOME:-"/usr/lib/hbase"}
>   HBASE_CONF_DIR=${HBASE_CONF_DIR:-"/etc/hbase/conf"}
>   if [[ ! -d $HBASE_CONF_DIR ]] ; then
>     # not explicitly set, nor in BigTop location. Try looking in HBASE_HOME.
>     HBASE_CONF_DIR="$HBASE_HOME/conf"
>   fi
>   # perhaps we've located the HBase config. if so, include it on classpath.
>   if [[ -d $HBASE_CONF_DIR ]] ; then
>     export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${HBASE_CONF_DIR}"
>   fi
>   # look for the hbase script. First check HBASE_HOME and then ask PATH.
>   if [[ -e $HBASE_HOME/bin/hbase ]] ; then
>     HBASE_BIN="$HBASE_HOME/bin/hbase"
>   fi
>   HBASE_BIN=${HBASE_BIN:-"$(which hbase)"}
>   # perhaps we've located HBase. If so, include its details on the classpath
>   if [[ -n $HBASE_BIN ]] ; then
>     # exclude ZK, PB, and Guava (See HIVE-2055)
>     # depends on HBASE-8438 (hbase-0.94.14+, hbase-0.96.1+) for `hbase mapredcp` command
>     for x in $($HBASE_BIN mapredcp 2>&2 | tr ':' '\n') ; do
>       if [[ $x == *zookeeper* || $x == *protobuf-java* || $x == *guava* ]] ; then
>         continue
>       fi
>       # TODO: should these should be added to AUX_PARAM as well?
>       export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${x}"
>     done
>   fi
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message